This webpage provides supporting information for the manuscript “Phylogenetic generalised linear mixed-effects modelling with the glmmTMB R package”, where we showcase the propto (proportional-to) covariance structure in the glmmTMB R package (McGillycuddy, 2023; Kristensen & McGillycuddy, 2025).
This webpage is organised in three sections:
First, we describe the simulated models and data-generating mechanisms for each of the five evaluated R packages for fitting PGLMMs (see Models).
We then use the simulated data to illustrate compilation and sampling runtimes for the Bayesian MCMC models (see Runtimes).
Finally, we present two case studies that apply these methods to ecological and evolutionary data, the first on bird colour traits and the second on plant traits (see Case studies).
We demonstrate and explain how to fit phylogenetic generalized mixed models (PGLMM) using five different packages which we used in our simulation study: glmmTMB, phyr, MCMCglmm, brms, and INLA. The models are fitted to one simulated dataset using a randomly generated phylogenetic tree and a set of model parameters. We compare the performance of these packages in terms of runtime, accuracy, and bias of the fixed effect mean and random effect variance estimates.
We assume a set of trait measures \(y_{ij}\) repeated measure (observation) \(i\) and for each species \(j\). The model is specified as follows:
\(p_i \sim \mathcal{N}(0, \sigma^2_{\text{p}} \mathbf{A})\): species-level effect (phylogenetic), where \(\mathbf{A}\) is the phylogenetic correlation matrix
First we specify the parameter values for one simulation run. We assume the following:
Code
### set up model parameters seed <-1b0 <-1# fixed effect interceptb1 <-1.5# fixed effect slopek.species <-30# number of speciesn.reps <-10# number of repeated measures per species (assuming a balanced design)sigma2.n <-0.25# variance of non-phylogenetic effectsigma2.p <-0.25# variance of phylogenetic effectsigma2.e <-0.2# residual error variance
Simulate data based on these parameter values:
Code
set.seed(seed)# set up k.obs (number of observations per species)k.obs <- n.reps # assumes a balanced design# species id sp.id <-rep(seq_len(k.species), times=k.obs)# total number of observationsn <-length(sp.id)# simulate simple dataframe with covariate (x) variablex <-runif(n, 10, 20) dat <-data.frame(obs =1:n, x = x, species = sp.id)# simulate tree and obtain phylo matrixtree <- ape::rtree(k.species, tip.label =seq_len(k.species))tree <- ape::compute.brlen(tree, power=1) # power of 1 means ultra-metric treephylo.mat <- ape::vcv(tree, corr =TRUE) # we want a correlation matrix (bounded by -1 and 1) ultrametric treephylo.mat <- phylo.mat[order(as.numeric(rownames(phylo.mat))), order(as.numeric(rownames(phylo.mat)))]# Simulate response variable (phen) based on cofactor and phylogenetic matrixu.s <-rnorm(k.species, 0, sqrt(sigma2.n))[sp.id]u.p <- MASS::mvrnorm(1, mu=rep(0, k.species), Sigma=sigma2.p*phylo.mat)[sp.id]ei <-rnorm(n, 0, sqrt(sigma2.e))# get estimates of yyi <- b0 + b1*x + u.s + u.p + ei### append all to dataframedat <-cbind(dat, u.s, u.p, ei, b0, b1, yi)dat$phylo <- dat$species # phylo ID variable (same as species) - needs to be numeric to work with INLAdat$species <-factor(dat$species) # format species variable for modelsdat$sp <- dat$species # create sp variable (for phyr)dat$g <-1# add variable g constant (for glmmTMB)save(dat, tree, phylo.mat, file ="simulated_data.RData")
To fit a PGLMM using glmmTMB package we specify the species-level random effect with phylogenetic relatedness using the propto covariance structure. In the first part we specify the random intercept (0 + species), followed by the grouping variable g (which we assume here as constant), and then the phylogenetic correlation matrix phylo.mat.
The model output will provide the fixed effect estimates, random effect variance estimates, and the residual variance estimate. The random effect variance estimates will be on the standard deviation scale, so we square them to get the variance estimates.
Code
# repeated measures per speciestime.glmmTMB <-system.time({ model_glmmTMB <-glmmTMB(yi ~ x + (1|species) +propto(0+ species|g, phylo.mat),data = dat,REML =TRUE)})
Check if the model converged (i.e. returns TRUE if the Hessian is positive definite, for more details read here: https://cran.r-project.org/web/packages/glmmTMB/vignettes/troubleshooting.html):
Code
model_glmmTMB$sdr$pdHess
[1] TRUE
We can then get the fixed effect estimates and their confidence intervals:
The random effect component estimates are on the standard deviation scales fron the confint function, so we need to square them to get the variance estimates. We also compute the standard error and confidence intervals on the variance scale using the delta method.
Code
re_tmb <-as.data.frame(confint(model_glmmTMB, parm="theta_")) # on standard deviation scalere_tmb
model group term estimate std.error conf.low conf.high
1 glmmTMB phylo var 0.1319 0.06294 0.05337 0.3258
2 glmmTMB species var 0.4271 0.31301 0.11239 1.6230
3 glmmTMB Residual var 0.2060 NA NA NA
Run model with phyr
To fit the model using phyr package we use the cov_ranef argument to specify the phylogenetic tree directly. The random effect term (1|sp__) indicates that we want to include both phylogenetic and non-phylogenetic random effects. The underscores __ indicate that we want to include phylogenetic correlations in the model.
Obtain the random effect and residual variance estimates. Note that phyr provides them on the standard deviation scale.
Code
# get phyr random effect variance estimates var_re_phyr <-c(as.numeric(model_phyr$ss[2])^2, #phylogenetic as.numeric(model_phyr$ss[1])^2, #non-phylogenetic as.numeric(model_phyr$ss[3])^2) #residual # combine into dataframesigma2_phyr <-data.frame(model ="phyr",group =c("phylo", "species", "Residual"),term ="var",estimate = var_re_phyr,std.error =NA,conf.low =NA, conf.high =NA)sigma2_phyr
model group term estimate std.error conf.low conf.high
1 phyr phylo var 0.3459 NA NA NA
2 phyr species var 0.6390 NA NA NA
3 phyr Residual var 0.2060 NA NA NA
Run model with MCMCglmm
To fit the MCMCglmm model we first we need to set up a precision matrix for the phylogenetic random effect, which (which is the inverse of the phylogenetic covariance matrix). We use the inverseA function the package to obtain the inverse of the phylogenetic covariance matrix.
These priors specify weakly informative settings: both random effects (\(G1\), \(G2\)) use inverse-Wishart style priors (\(V=1\), \(\nu=1\), with \(\alpha.\mu=0\), \(\alpha.V=1000\)), while the residual variance (\(R\)) has an almost flat inverse-Gamma prior (\(V=1\), \(\nu=0.02\)).
The ginverse argument is used to specify the inverse of the phylogenetic covariance matrix. Then we specify the number of iterations, burn-in, and thinning parameters for the MCMC sampling. Here we increased the number of iterations by 30 times the default value to ensure convergence and stability of the MCMC chains, but this could be further increased. Usually you would want to set these to some reasonably large values after inspection of diagnostic plots (i.e. trace plots and posterior distributions).
Code
# get precision phylo matrix and order rowsphylo.prec.mat <- MCMCglmm::inverseA(tree, nodes ="TIPS", scale =TRUE)$Ainvphylo.prec.mat <- phylo.prec.mat[order(as.numeric(rownames(phylo.prec.mat))),order(as.numeric(rownames(phylo.prec.mat)))]# set priors for two random effectsprior <-list(G=list(G1=list(V=1,nu=1,alpha.mu=0,alpha.V=1000), G2=list(V=1,nu=1,alpha.mu=0,alpha.V=1000)),R=list(V=1,nu=0.02))# fit MCMCglmm modeltime.mcmc <-system.time({ model_mcmc <-MCMCglmm(yi ~ x,random =~species + phylo,family ="gaussian",ginverse =list(phylo = phylo.prec.mat),prior = prior,data = dat,verbose =FALSE,nitt =303000, # increase default by x30burnin =3000, # defaultthin =10) # default})
Plot the trace plots of the parameters to check for convergence. We expect to see good mixing of the chains and no trends.
Code
plot(model_mcmc$Sol, ask =FALSE)
Next we check the effective sample size (ESS). The ESS should be above 400 for both fixed and random effects and the Rhat value should be below 1.01 (Vehtari et al., 2021).
We can also check the Heidelberger diagnostic test which checks if the MCMC chains have converged and are stationary ( we expect to see ‘passed’ for all parameters). In practice it is important to inspect diagnostic plots of the Markov chains and posterior distributions.
# A tibble: 3 × 7
model group term estimate std.error conf.low conf.high
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 MCMCglmm species var__(Intercept) 0.149 0.0812 0.0258 0.344
2 MCMCglmm phylo var__(Intercept) 0.660 0.508 0.104 2.03
3 MCMCglmm Residual var__Observation 0.208 0.0180 0.175 0.245
Run model with brms
To fit the model using the brms package we specify the random effects using the (1|species) term for non-phylogenetic random effects and gr(phylo, cov = phylo.mat) for phylogenetic random effects. The data2 argument is used to pass the phylogenetic correlation matrix to the model (the same matrix format as for glmmTMB). We set the number of iterations, chains, and cores to reasonable values to ensure convergence and stability of the MCMC chains. Here we increased the number of iterations by 10 times the default value to ensure convergence and stability of the MCMC chains, but this could be further increased (and should be if ESS are low and Rhat values are higher than 1.01).
Code
time.brms <-system.time({ model_brms <-brm(yi ~ x + (1|species) + (1|gr(phylo, cov = phylo.mat)), #phylo.mat is the correlation matrixdata = dat,family =gaussian(),chains =4, # defaultiter =20000, # increased default x10cores =4, # equal to number of chainsdata2 =list(phylo.mat = phylo.mat))})
Look at the trace plots of the parameters to check for convergence. We expect to see good mixing of the chains and no trends.
Code
plot(model_brms, N =2, ask =FALSE)
Check that the maximum effective sample size (ESS) of the model is high enough and check the Rhat value is below 1.01 (Vehtari et al. 2021).
# check RHat value is below 1.01 (Vehtari et al. 2021)max(rhat(model_brms))<1.01
[1] TRUE
Code
# alternatively, we can use the bayestestR package to check the diagnostics of fixed effectsprint(bayestestR::diagnostic_posterior(model_brms), digits =4)
For the INLA package we set up the model using the f() function to specify the random effects. The model = "iid" argument indicates that we want to use an independent and identically distributed (iid) random effect for species, while the model = "generic0" argument is used for the phylogenetic random effect. The Cmatrix argument is used to specify the phylogenetic correlation matrix, which should be a precision matrix similar to MCMCglmm. We note that priors can be incorporated for parameter using the argument hyper=.
Code
time.inla <-system.time({ model_inla <-inla(yi ~ x +f(species, model ="iid") +f(phylo, ## this needs to be a numeric to workmodel ="generic0",Cmatrix = phylo.prec.mat),family ="gaussian",data = dat)})fit_inla <-summary(model_inla)
model group term estimate std.error conf.low conf.high
1 INLA Residual var 0.2045 0.421 0.2435 0.17358
2 INLA species var 0.1177 4.134 0.3388 0.05315
3 INLA phylo var 0.1999 4.717 1.1547 0.05707
Additional note: if a model returns extreme or unreasonable variance estimates, it is important to check robustness by refitting the same model with another package. In INLA, centering the random effect estimates (by specifying const = TRUE within the random effect term) or trying higher starting values (default is 4, e.g. hyper = list(prec=list(initial=8))) to see if the estimates converge to more reasonable values. Random effect variance estimates are often unstable, so they should be interpreted with care.
Model estimates
Summary of model runtime, covariate \(x\) estimate and it’s confidence interval for each model:
Sometimes we only have one measure per species. This may happen because of limited sampling or because repeated observations for a species have already been averaged. In such cases, it is common to work with a single value per species.
Here we will demonstrate how to fit a phylogenetic model when there is one measure per species. We will use the dataset dat of repeated measures per species simulated above (300 observations of 30 species) and compute a mean value for each species.
This model converged, but it returned very wide uncertainty intervals for the non-phylogenetic variance component (on the standard deviation scale). This reflects the fact that, with only one observation per species, the species intercept and the phylogenetic effect are not separately identifiable by the model. In some cases, the model may even fail to converge, highlighting the redundancy of including both terms.
When the non-phylogenetic component is removed, the estimate of the phylogenetic effect remains unchanged, the fixed effects are identical, and the residual variance increases to absorb the variability that can no longer be partitioned between the two random effects.
In some analyses, we may want to include a random slope for the phylogenetic effect. This allows us to ask whether the relationship between a covariate and the response varies among species in a way that reflects their shared evolutionary history. For example, we might be interested in whether species with different body weights show different slopes, or whether the effect of a treatment differs across species in a phylogenetically structured way.
Given the above simulated dataset, we assume a set of trait measures \(y_{ij}\) repeated measure (observation) \(i\) and for each species \(j\). The phylogenetic random slope model assuming independence is specified as follows:
What if there is a correlation between random intercept and slope?
glmmTMB cannot currently estimate correlated random intercepts and slopes when using the propto covariance structure. This means the intercept–slope covariance is fixed at zero and cannot be separated out. A common workaround is to centre the covariate, which reparameterises the model so that the intercept represents the expected response at the mean covariate value. This improves interpretability and typically stabilises slope variance estimates, though it does not recover any true intercept–slope correlation.
If the data contain little or no correlation, the variance estimates from glmmTMB with centred covariates will usually be close to those from packages that estimate the correlation. However, if the correlation is strong, glmmTMB may distort the partitioning of variance because the shared variation has no explicit covariance term.
For comparison, the random-effects structure in a model with correlated random intercepts and slopes is:
Recommendation: fit models in both glmmTMB and in Bayesian frameworks such as the brms and MCMCglmm packages in R. Use globally centred covariates in both cases: centring helps stabilise estimation in glmmTMB and also improves interpretability in Bayesian models, though Bayesian approaches can additionally estimate the correlation directly.
If there is truly little or no correlation, the results will look very similar to glmmTMB: the variance estimates will be close, and the correlation will be near zero.
If there is strong correlation, the slope and intercept variance estimates in glmmTMB may still approximate the overall variability, but they could be biased, because part of the shared variation is forced into the variance components rather than partitioned as covariance.
This section is still a work in progress.
Phylogenetic trees are always approximations and never represent the “true” evolutionary history with complete accuracy. In some cases, however, we may have access to multiple candidate trees. These can be used to fit separate models and then combine the results, for example using Rubin’s rules for multiple imputation.
Here, we illustrate this approach with the simulated repeated-measures species dataset. As a first step, we generate 50 random phylogenetic trees to represent a set of hypothetical candidate trees:
Code
set.seed(123)# simulate 50 random trees trees <-replicate(50, ape::rtree(k.species, tip.label = sp.id), simplify =FALSE)# now make a phylogenetic correlation matrix for each tree,make_phylo_mat <-function(tree, sp) { C <-vcv(tree, corr =TRUE) ord <-match(sp.id, rownames(C)) C[ord, ord, drop =FALSE]}phylo_mats <-lapply(trees, make_phylo_mat, sp = sp.id)
We fit the same model across 50 candidate trees and pool fixed effects using Rubin’s rules (average within-tree variance plus between-tree variance):
For the random-effect SDs and the residual variance across trees [….]
3 Bayesian MCMC model runtime
Here we provide a breakdown of the compilation and sampling runtimes for the Bayesian MCMC models using the above simulated repeated measures dataset. The runtimes are provided for each model fitted using the MCMCglmm and brms packages. Note: the compilation time is the time taken to compile the model, while the sampling time is the time taken to sample from the posterior distribution of the model parameters.
First we set up the MCMCglmm model.
MCMCglmm
Set up MCMCglmm tuning parameters - usually set to some reasonably large values through trial and error.
The following case studies illustrate how phylogenetic mixed models can be applied to diverse questions in ecology and evolution. The first examines patterns of plumage colour in birds, while the second explores macroevolutionary patterns in plant traits. Together, they demonstrate the flexibility of these methods across taxa and research questions.
All models, results, and datasets presented in these case studies are intended for illustrative purposes only. They are designed to demonstrate analytical approaches rather than to provide definitive biological insights. No substantive conclusions should be drawn from these analyses.
4.2 Case study 1: Bird color evolution
Birds are a valuable system for studying ecological and evolutionary processes because their diversity in form, behaviour, and coloration is well documented across many species and habitats. Variation in plumage colour, in particular, can provide insights into mechanisms such as sexual selection, camouflage, and signalling. By examining these traits, researchers can better understand how environmental and evolutionary pressures shape biodiversity.
credits: Andy Chilton, Boris Smokrovic, David Clode, Jaime Spaniol, Joshua J. Cotte, Vincent Van Zalinge / unsplash.com.
Data overview
Load data bird spectrum data:
Code
# Load bird spectrum data bird_data <-read_delim("data/Spec_IndivReg_Coralie.csv", delim =";", escape_double =FALSE, locale =locale(decimal_mark =","), trim_ws =TRUE)# summary of birds species and genuslength(table(bird_data$genus_original)) # 446 genus
[1] 446
Code
length(table(bird_data$sci_name_Jetz)) # 949 species
[1] 949
Code
#length(table(bird_data$sci_name_original)) # 952 original species# Remove the specified species from bird_databird_data <- bird_data |>filter(!sci_name_Jetz %in%c("Basileuterus_rufifrons", "Malurus_lamberti", "Malurus_splendens"))########## Notes ########## missing values ---> not sure why?#na_count <- colSums(is.na(dat2))#na_count[na_count > 0]# individuals a-c are male and d-f are female#table(bird_data$sex, bird_data$individual)# number of measurements?#table(bird_data$Nmeasured)# duplicates per individual per body region#dup <- dat |># summarise(n = n(), # .by = c(wl, individual_nonrep, sex, body_region))#r <- dup[which(dup$n>1),]#table(r$individual_nonrep)
Some notes:
The dataset contains data for 949 bird species, with 446 unique genera.
Spectral values are normalised reflectance data.
Nmeasured is the number of measurements per body patch.
Each body region measured 5 times (then averaged).
These species seem to have two measurements per body region per individual, which we remove to facilitate the analysis.:
Now we want to create a wavelength dataset using the pavo package given the spectral reflectance data.
Code
# set up wavelength dataset (wavelengths columns 300 to 474)dat <- bird_data |>pivot_longer(cols =`300`:`700`, names_to="wl", values_to="refl") |> dplyr::select(individual_nonrep, body_region, sex, wl, refl) |>mutate(wl =as.numeric(wl))# pivot so each column is an individual body region measurementdat2 <- dat |>pivot_wider(names_from =c("individual_nonrep", "sex", "body_region"),values_from ="refl", names_sep =".")# create spectral dataset with pavospecs <-as.rspec(dat2)# some examples of individual birds body region color spectrumplot(Acrocephalus_palustris_a.Male.throat ~ wl, type="l", data=specs, title="Acrocephalus palustris (throat)")
Code
plot(Alle_alle_a.Male.throat ~ wl, type="l", data=specs, title="Alle alle (throat)")
Code
plot(Aplonis_metallica_a.Male.wing_cov ~ wl, type ="l", data = specs, title="Aplonis metallica (wing covert)")
Code
# use procspec to adjust negative values by shifting them by 10# (this maybe due to darker colors)specs <-procspec(specs, fixneg="addmin")plot(specs, select =10)
Obtain spectral shape descriptors
Using the wavelength dataset we can now obtain spectral shape descriptors. These descriptors will be used in the model. Here we will focus on four descriptors of interest: brightness (B1), spectral slope (S1), spectral curvature (S9), and hue (H4). More information about the spectral shape descriptors can be found here: https://book.colrverse.com/spectral-shape-descriptors.html
Code
# Obtain spectral shape descriptorsspec.des <-summary(specs, subset =c("B1", "B2", "S9", "H4")) ## using subset makes it faster to run by selecting the shapes of interestdev.off()
null device
1
Code
# distribution of B1ggplot(spec.des, aes(x = B1)) +geom_histogram(bins =30, colour ="black", fill ="grey") +labs(x ="B1", title ="B1: Total brightness") +theme_bw()# distribution of S9ggplot(spec.des, aes(x = S9)) +geom_histogram(bins =30, colour ="black", fill ="grey") +labs(x ="S9", title ="S9: Carotenoid chroma") +theme_bw()# vdistribution o fH4ggplot(spec.des, aes(x = H4)) +geom_histogram(bins =30, colour ="black", fill ="grey") +labs(x ="H4", title ="H4: Hue (segment classification)") +theme_bw()
The first dataset is the shape descriptors dataset, which we will use to model the continuous trait (brightness).
Obtain carotenoid datasets: binary and ordinal traits
We will use the carotenoid dataset to model the presence/absence of carotenoid coloration in birds. This dataset is based on the spectral reflectance data and the spectral shape descriptors.
Let’s set up the dataset for modelling a binary trait (absence/presence of carotenoid color):
Code
# 1) get human visual model (CIE 10 degree observed under D65 "day light")vm <-vismodel(specs, visual ="cie10", illum ="D65", bkg ="ideal", relative =FALSE)# 2) convert to CIELAB colorspace (to get hue and chroma)lab <-colspace(vm, space ="cielab") # 3) get chroma and hue angle (converting CIELAB to CIELCh)L <- lab$La <- lab$ab <- lab$bC <-sqrt(a^2+ b^2)h <- (atan2(b, a) *180/ pi) %%360# degrees in 0 to 360# 4) thresholds for carotenoid range and saturationh_lower <-330# start of redh_upper <-100# end of yellowC_min <-15# min chromaL_min <-20# avoid very dark samples# 5) Hue range test in_range <- h >= h_lower | h <= h_upper# 6) COmpute binary trait: 1 = carotenoid like colour present, 0 = absentpresent <-as.integer(in_range & C >= C_min & L >= L_min)# set up datasetcarot.dat.all <-data.frame(rowname =rownames(lab),L = L, a = a, b = b, C = C, h = h,carotenoid = present)# merge with spec.data based on "rowname" columncarot.dat.all <-merge(spec.dat, carot.dat.all, by.x ="rowname", by.y ="rowname", all =TRUE)# create individual id variablecarot.dat.all$indiv_rep <-sub(".*_([a-f])\\..*$", "\\1", carot.dat.all$rowname)# number of observations with carotenoid like colourtable(carot.dat.all$carotenoid)
carot.dat <- carot.dat.all# save as csvwrite.csv(carot.dat, file="data/carotenoid_data_for_model.csv", row.names =FALSE)
Now let’s obtain the dataset where we summarise for each individual bird the proportion of body region with carotenoid color presence (ordinal data trait), and save it for modelling later on:
Code
carot.ordinal <- carot.dat.all |>group_by(species, sex, indiv_rep) |>summarise(prop_carotenoid =mean(carotenoid), .groups ="drop")# save as csvwrite.csv(carot.ordinal, file="data/ordinal_data_for_model.csv", row.names =FALSE)
Phylogenetic correlation matrix set-up
Load and view phylogenetic tree:
Code
# Load bird tree (consensus tree = "combined tree")bird.tree <-read.tree("data/Stage2_Hackett_MCC_no_neg.tre")### Prune bird treebird.pruned <-keep.tip(bird.tree, bird_data$sci_name_Jetz)# check whether names match in data and treecheck <-name.check(bird.pruned, bird_data$sci_name_Jetz, sort(bird.pruned$tip.label))# plot treeplotTree(bird.pruned, ftype="i", fsize=0.4, lwd=1, type="fan")
Code
dev.off()
null device
1
Set up correlation matrix for glmmTMB model and check it corresponds to the species labels in the data:
Code
# set up phylogenetic correlation matrixphylo.mat <-vcv(bird.pruned, corr =TRUE) phylo.mat <- phylo.mat[sort(rownames(phylo.mat)), sort(rownames(phylo.mat))]saveRDS(phylo.mat, file ="data/phylo_matrix.rds")# checks # length(colnames(phylo.mat))==length(table(spec.dat$species))# all(head(rownames(phylo.mat))==head(colnames(phylo.mat)))# head(table(spec.dat$species))
Model 1. Continuous trait
Code
# load dataspec.dat <-read_csv("data/spec_data_for_model.csv")phylo.mat <-readRDS("data/phylo_matrix.rds")# load library library(glmmTMB)# add grouping variable (set it to 1) - this is necessary to fit the glmmTMB modelspec.dat$g <-1
Modelling total brightness (B1)
The brightness trait is right-skewed (as shown above), which is consistent with multiplicative evolutionary change. To identify an appropriate sampling distribution, we will fit four models with an identical linear predictor and random effects structure:
Gaussian to model \(\log(B1)\)
Gamma with a log link to model \(B1\)
For each model we will examine simulated standardised residuals using to assess nonlinearity and heteroscedasticity. If two or more models perform similarly, we will prefer the model with clearer interpretation on the original scale.
Code
# Fit models --------------# normaltime_norm <-system.time( m1 <-glmmTMB(log(B1) ~ body_region * sex + (1|species) +propto(0+ species|g,phylo.mat),family =gaussian(),data = spec.dat) )# Gamma distributiontime_gamma <-system.time( m2 <-glmmTMB(B1 ~ body_region * sex + (1|species) +propto(0+ species|g,phylo.mat),family =Gamma(link ="log"),data = spec.dat))# Get model info ---------------------------# check whether model has postive definite hessianb1_output <-data.frame(model =c("Gaussian (log)", "Gamma"),convergence =c(m1$sdr$pdHess, m2$sdr$pdHess),runtime =c(time_norm[["elapsed"]], time_gamma[["elapsed"]]))b1_output
We found that the Gamma distribution shows improve model fit in the residual plots.
Model estimates
Let’s obtain the estimate of the phylogenetic signal and non-phylogenetic signal for the Gamma model:
Code
# get random effect component estimates on SD-scale m2_re <-as.data.frame(confint(m2, parm="theta_"))sigma2_s <- m2_re$Estimate[1]^2# non-phylogenetic variancesigma2_p <- m2_re$Estimate[2]^2# phylogenetic variance estimatep.signal <- sigma2_p / (sigma2_p + sigma2_s)p.signal
[1] 0.842
Code
np.signal <-1-p.signalnp.signal
[1] 0.158
On the log mean scale of the Gamma model, the phylogenetic species effect explained 84.2% of the total species level random effect variance i.e. this represents the degree of phylogenetic signal in the overall variance sourced from species. The non-phylogenetic effect explained 15.8% of the total species level random effect variance.
We can obtain and compare the residual variances for each model with the following:
Code
# residual variance of gaussian modelm1_res.var <-sigma(m1)^2# residual variance of gamma modelphi <-sigma(m2)^2#phi given on SD-scalescale <-1/phi #derive scale parameterm2_res.var <-trigamma(scale)# printdata.frame(model =c("Gaussian (log)", "Gamma"),residual_variance =c(m1_res.var, m2_res.var))
model residual_variance
1 Gaussian (log) 0.3265
2 Gamma 0.3319
To assess differences in total brightness between sexes for each body region, we compute marginal means from the fitted model and perform pairwise comparisons between females and males. The resulting ratios (female/male) and their confidence intervals are then plotted to visualise the magnitude and direction of differences across body regions.
Code
emm_b1 <-emmeans(m2, ~ sex | body_region, type ="response")# get contrasts (ratios) back transformed on response scale with CIb1_sex_diff <-contrast(emm_b1, method ="pairwise", reverse=TRUE) |>summary(infer =TRUE, type ="response")b1_sex_diff
body_region = back:
contrast ratio SE df asymp.LCL asymp.UCL null z.ratio p.value
female / male 0.973 0.0139 Inf 0.946 1.000 1 -1.923 0.0545
body_region = belly:
contrast ratio SE df asymp.LCL asymp.UCL null z.ratio p.value
female / male 1.069 0.0153 Inf 1.040 1.100 1 4.708 <.0001
body_region = cov:
contrast ratio SE df asymp.LCL asymp.UCL null z.ratio p.value
female / male 0.998 0.0143 Inf 0.970 1.026 1 -0.141 0.8882
body_region = crown:
contrast ratio SE df asymp.LCL asymp.UCL null z.ratio p.value
female / male 0.912 0.0131 Inf 0.887 0.938 1 -6.390 <.0001
body_region = tail:
contrast ratio SE df asymp.LCL asymp.UCL null z.ratio p.value
female / male 1.020 0.0145 Inf 0.992 1.050 1 1.427 0.1537
body_region = throat:
contrast ratio SE df asymp.LCL asymp.UCL null z.ratio p.value
female / male 1.075 0.0154 Inf 1.046 1.106 1 5.099 <.0001
Confidence level used: 0.95
Intervals are back-transformed from the log scale
Tests are performed on the log scale
Code
## plot pairwise differences between female and male total brightnessb1_sex_diff |>arrange(ratio) |>mutate(body_region =factor(body_region, unique(body_region), ordered =TRUE)) |>ggplot(aes(y = body_region, x = ratio)) +geom_point(size =3, colour ="#7B5EA9") +geom_errorbar(aes(xmin = asymp.LCL, xmax = asymp.UCL), width =0.2, colour ="#7B5EA9") +geom_vline(xintercept =1, linetype ="dashed", colour ="grey40") +labs(y ="Body Region", x ="Female / Male Ratio", title ="Brightness") +theme_bw() +theme(legend.position ="none")
Ratios \(>1\) indicate greater brightness in females, while ratios \(<1\) indicate greater brightness in males.
Model 2. Binary trait
Binary colour traits, such as the presence of carotenoid colours, may be associated with distinct evolutionary and ecological drivers.
Here we want to model the absence or presence of carotenoid across all body regions accounting for sex and including species- and phylogeny-level random effects. We will use the carotenoid dataset which we obtained earlier.
Modelling carotenoid presence
We will fit two models: - Binomial model with a logit link function - Beta binomial model with a logit link function
Code
# Fit models --------------# load datcarot.dat <-read_csv("data/carotenoid_data_for_model.csv")carot.dat$g <-1# binomial modeltime_binom <-system.time( m3 <-glmmTMB(carotenoid ~ body_region * sex + (1|species) +propto(0+ species|g,phylo.mat),family =binomial(link ="logit"),data = carot.dat))# beta binomialtime_bbinom <-system.time( m4 <-glmmTMB(carotenoid ~ body_region * sex + (1|species) +propto(0+ species|g,phylo.mat),family =betabinomial(link ="logit"),data = carot.dat))# Get model info -----------carotmod_output <-data.frame(model =c("Binomial", "Beta binomial"),convergence =c(m3$sdr$pdHess, m4$sdr$pdHess),runtime =c(time_binom[["elapsed"]], time_bbinom[["elapsed"]]),AIC =c(AIC(m3), AIC(m4)))carotmod_output
We found that the binomial model has improved model fit in AIC and in the residual plots.
Model estimates
Let’s obtain the estimate of the phylogenetic signal for this model:
Code
# get random effect component estimates on SD-scalem3_re <-as.data.frame(confint(m3, parm="theta_"))sigma2_s <- m3_re$Estimate[1]^2# non-phylogenetic variancesigma2_p <- m3_re$Estimate[2]^2# phylogenetic variance estimatep.signal <- sigma2_p / (sigma2_p + sigma2_s)p.signal
[1] 0.9243
Get model estimated marginals means on the response scale:
Code
# Estimated marginal means on the response scale (odds)emm_m3 <-emmeans(m3, ~ sex | body_region, type ="response")# get contrasts (odds ratios)m3_sex_prob <-contrast(emm_m3, method ="pairwise", reverse=TRUE) |>summary(infer =TRUE, type ="response")# Put into data frame for plottingsex_prob_df <-as.data.frame(m3_sex_prob) |>rename(ratio = odds.ratio, lower.CL = asymp.LCL, upper.CL = asymp.UCL)sex_prob_df
body_region = back:
contrast ratio SE df lower.CL upper.CL null z.ratio p.value
female / male 0.0000 0.00005 Inf 0.0000 Inf 1 -0.007 0.9947
body_region = belly:
contrast ratio SE df lower.CL upper.CL null z.ratio p.value
female / male 0.9353 0.13585 Inf 0.7036 1 1 -0.460 0.6453
body_region = cov:
contrast ratio SE df lower.CL upper.CL null z.ratio p.value
female / male 0.4490 0.21091 Inf 0.1788 1 1 -1.705 0.0883
body_region = crown:
contrast ratio SE df lower.CL upper.CL null z.ratio p.value
female / male 0.7303 0.24013 Inf 0.3834 1 1 -0.956 0.3391
body_region = tail:
contrast ratio SE df lower.CL upper.CL null z.ratio p.value
female / male 0.5335 0.27658 Inf 0.1931 1 1 -1.212 0.2255
body_region = throat:
contrast ratio SE df lower.CL upper.CL null z.ratio p.value
female / male 0.7066 0.18028 Inf 0.4285 1 1 -1.361 0.1734
Confidence level used: 0.95
Intervals are back-transformed from the log odds ratio scale
Tests are performed on the log odds ratio scale
The ordinal trait is the percentage of body region with carotenoid presence. We will fit a beta-binomial and model to this trait, which is suitable for modeling proportions.
# get random effect component estimates on SD-scalem5_re <-as.data.frame(confint(m5, parm="theta_"))sigma2_s <- m5_re$Estimate[1]^2# non-phylogenetic variance est.sigma2_p <- m5_re$Estimate[2]^2# phylogenetic variance est.p.signal.m5 <- sigma2_p / (sigma2_p + sigma2_s)p.signal.m5
[1] 0.9666
Now look at the difference in the proportion of carotenoid colour between females and males from the ordinal beta model (m5), and plot the model-based estimate with its 95% confidence interval on the response scale.
Code
emm_m5 <-emmeans(m5, ~ sex, type ="response")# pairwise contrast: Female vs Malem5_sex_OR <-contrast(emm_m5, method ="pairwise") |>summary(infer =TRUE, type ="response")# get summary with CIsex_OR <-as.data.frame(m5_sex_OR) |>rename(OR = odds.ratio, lower.CL = asymp.LCL, upper.CL = asymp.UCL)sex_OR
contrast OR SE df lower.CL upper.CL null z.ratio p.value
female / male 0.9169 0.0246 Inf 0.8699 0.9664 1 -3.235 0.0012
Confidence level used: 0.95
Intervals are back-transformed from the log odds ratio scale
Tests are performed on the log odds ratio scale
4.3 Case study 2: Evolution of plant hydraulic traits
We re-analysed the published study of Sanchez-Martinez et al. (2020) on the evolution of plant hydraulic traits using phylogenetic generalized linear mixed models (PGLMMs). The original study used a Bayesian MCMC approach to fit the models with the MCMCglmm package. Here we will use the glmmTMB package to fit the models with different sampling distributions and compare with MCMCglmm. Again all results are for illustrative purpose only and we do not intend to infer any biological or ecological conclusions from the following results.
credits: Chanya_B / stock.adobe.com
Data overview
Code
# Load plant hydraulic traits datasethydra.dat <-read_csv("data/HydraEvol2020.csv")# Have a look at distribution of traitshist(hydra.dat$Ks, breaks =50, main ="Distribution of Ks", xlab ="Ks")
Code
boxplot(hydra.dat$Ks ~ hydra.dat$group, main ="Boxplot of Ks by group", xlab ="Group", ylab ="Ks")
Code
hist(hydra.dat$P50, breaks =50, main ="Distribution of P50", xlab ="P50")
Code
boxplot(hydra.dat$P50 ~ hydra.dat$group, main ="Boxplot of P50 by group", xlab ="Group", ylab ="P50")
Plant species with high saturated hydraulic conductivity (kS) are able to transport water through their xylem more efficiently, and are therefore characterised as highly efficient in water transport.
First let’s set up the models for glmmTMB:
Log(response) Gaussian.
Gamma model.
First, let’s set up the phylogenetic correlation matrix for the glmmTMB model and the inverse matrix for MCMCglmm.
We want to obtain a phylogenetic matrix that match all genera in the Ks dataset.m We will therefore remove all missing entries for the Ks trait and prune the tree and the dataset to have the same genera in both.
Code
# create dat for ks (remove all missing values)ks.dat <-subset(hydra.dat, !is.na(Ks))all(is.na(ks.dat$genus))
[1] FALSE
Code
# get common genus dataset and treecommon.genus <-intersect(plant.tree$tip.label, ks.dat$genus)# Prune treeplant.tree.pruned <- ape::drop.tip(plant.tree, setdiff(plant.tree$tip.label, common.genus))#length(plant.tree.pruned$tip.label)# Prune datasetks.dat <- ks.dat[ks.dat$genus %in% common.genus, ]#length(table(ks.dat$genus))# check whether names match in data and treecheck <- geiger::name.check(plant.tree.pruned, ks.dat$genus, sort(plant.tree.pruned$tip.label))# set up phylogenetic correlation matrixphylo.mat.p <-vcv(plant.tree.pruned, corr =TRUE) phylo.mat.p <- phylo.mat.p[sort(rownames(phylo.mat.p)), sort(rownames(phylo.mat.p))]# set up inverse phylogenetic correlation matrix for MCMCglmmphylo.inv <- MCMCglmm::inverseA(plant.tree.pruned, nodes ="TIPS")$Ainvphylo.inv <- phylo.inv[sort(rownames(phylo.inv)),]# checks # length(colnames(phylo.mat.p))==length(table(ks.dat$genus))# all(head(rownames(phylo.mat.p))==head(colnames(phylo.mat.p)))# length(rownames(phylo.inv))==length(table(ks.dat$genus))# head(table(ks.dat$genus))
Code
# make sure to add grouping factor for glmmTMBks.dat$g <-1ks.dat$genus <-as.factor(ks.dat$genus)# Fit models --------------# normaltime_norm <-system.time( m6 <-glmmTMB(log(Ks) ~ group + (1|genus) +propto(0+ genus|g, phylo.mat.p),family =gaussian(),REML=TRUE,data = ks.dat) )# Gamma distributiontime_gamma <-system.time( m7 <-glmmTMB(Ks ~ group + (1|genus) +propto(0+ genus|g,phylo.mat.p),family =Gamma(link ="log"),data = ks.dat))
Now, we will fit the Gaussian Log(Ks) model with MCMCglmm:
Diagnostic plots
Let’s check residual plots with the DHARMa packages. First, the log(Ks) assuming normal distribution residual checks:
Get phylogenetic signal from glmmTMB Gaussian model:
Code
# get random effect component estimates on SD-scale m6_re <-as.data.frame(confint(m6, parm="theta_"))sigma2_s <- m6_re$Estimate[1]^2# non-phylogenetic variancesigma2_p <- m6_re$Estimate[2]^2# phylogenetic variance estimatep.signal <- sigma2_p / (sigma2_p + sigma2_s)p.signal
[1] 0.8391
We found that the phylogenetic species effect explained 83.9% of the total species level random effect variance, and hence the non-phylogenetic effect explained only 16.1% of the total species level random effect variance.
Get model estimated marginals means on the response scale:
Code
# Estimated marginal means on the response scale (odds)emm_m6 <-emmeans(m6, ~ group, type ="response")emm_m6
group response SE df lower.CL upper.CL
Angiosperms 1.22 0.938 1021 0.271 5.51
Gymnosperms 0.62 0.307 1021 0.235 1.64
Confidence level used: 0.95
Intervals are back-transformed from the log scale
Code
# Get the contrast between the two groupsgroup_diff <-contrast(emm_m6, method ="pairwise", reverse=TRUE) |>summary(infer =TRUE, type ="response")# Plotggplot(group_diff, aes(y = contrast, x = ratio)) +geom_point(size =3, colour ="#6BAF92") +geom_errorbar(aes(xmin = lower.CL, xmax = upper.CL), width =0.2, colour ="#6BAF92") +geom_vline(xintercept =1, linetype ="dashed", colour ="grey40") +labs(y =NULL, x ="",title ="kS contrast: Angiosperms vs Gymnosperms") +theme_bw() +theme(legend.position ="none")
2. Model for P50
P50 represents the level of water stress at which a tree experiences a 50% loss of hydraulic conductivity. More negative P50 values indicate that trees can withstand greater hydraulic stress before suffering hydraulic damage, so communities with lower P50s are considered more resilient to increasing water stress (Trugman et al., 2020).
First let’s set up the models for glmmTMB:
Try gamma model
Log(response) gaussian
First, le’s a phylogenetic matrix that match all genera in the P50 dataset. We will therefore remove all missing entries for the Ks trait and prune the tree and the dataset to have the same genera in both.
Code
# create dat for ks (remove all missing values)p50.dat <-subset(hydra.dat, !is.na(P50))# create negative P50 variablep50.dat$negP50 <-- p50.dat$P50# get common genus across dataset and treecommon.genus <-intersect(plant.tree$tip.label, p50.dat$genus)# Prune datasetp50.dat <- p50.dat[p50.dat$genus %in% common.genus, ]# Prune treeplant.tree.pruned <- ape::drop.tip(plant.tree, setdiff(plant.tree$tip.label, common.genus))# check whether names match in data and treecheck <- geiger::name.check(plant.tree.pruned, p50.dat$genus, sort(plant.tree.pruned$tip.label))# set up phylogenetic correlation matrixphylo.mat.p <-vcv(plant.tree.pruned, corr =TRUE) phylo.mat.p <- phylo.mat.p[sort(rownames(phylo.mat.p)), sort(rownames(phylo.mat.p))]# set up inverse phylogenetic correlation matrix for MCMCglmmphylo.mat.p.inv <-solve(phylo.mat.p)# checks # length(colnames(phylo.mat.p))==length(table(p50.dat$genus))# all(head(rownames(phylo.mat.p))==head(colnames(phylo.mat.p)))# head(table(p50.dat$genus))
Code
# make sure to add grouping factor for glmmTMBp50.dat$g <-1p50.dat$genus <-as.factor(p50.dat$genus)# Fit models --------------# normaltime_norm <-system.time( m8 <-glmmTMB(log(negP50) ~ group + (1|genus) +propto(0+ genus|g, phylo.mat.p),family =gaussian(),REML=TRUE,data = p50.dat) )# Gamma distributiontime_gamma <-system.time( m9 <-glmmTMB(negP50 ~ group + (1|genus) +propto(0+ genus|g,phylo.mat.p),family =Gamma(link ="log"),data = p50.dat))
Same models with MCMCglmm
Diagnostic plots
Let’s check residual plots with the DHARMa packages. First, the log(Ks) assuming normal distribution residual checks:
Residual diagnostics are relatively similar, for both distributions so we keep the Gaussian log(response) model specification.
Results
Get phylogenetic signal from glmmTMB Gaussian model:
Code
# get random effect component estimates on SD-scale m9_re <-as.data.frame(confint(m9, parm="theta_"))sigma2_s <- m9_re$Estimate[1]^2# non-phylogenetic variancesigma2_p <- m9_re$Estimate[2]^2# phylogenetic variance estimatep.signal <- sigma2_p / (sigma2_p + sigma2_s)p.signal
[1] 0.7369
We found that the phylogenetic species effect explained 83.9% of the total species level random effect variance, and hence the non-phylogenetic effect explained only 16.1% of the total species level random effect variance.
Get model estimated marginals means on the response scale:
Code
# Estimated marginal means on the response scale (odds)emm_m9 <-emmeans(m9, ~ group, type ="response")emm_m9
group response SE df asymp.LCL asymp.UCL
Angiosperms 2.21 1.03 Inf 0.883 5.53
Gymnosperms 4.24 1.25 Inf 2.384 7.54
Confidence level used: 0.95
Intervals are back-transformed from the log scale
Code
# Get the contrast between the two groupsgroup_diff <-contrast(emm_m9, method ="pairwise", reverse=TRUE) |>summary(infer =TRUE, type ="response")# Plotggplot(group_diff, aes(y = contrast, x = ratio)) +geom_point(size =3, colour ="#6BAF92") +geom_errorbar(aes(xmin = asymp.LCL, xmax = asymp.UCL), width =0.2, colour ="#6BAF92") +geom_vline(xintercept =1, linetype ="dashed", colour ="grey40") +labs(y =NULL, x ="",title ="kS contrast: Angiosperms vs Gymnosperms") +theme_bw() +theme(legend.position ="none")
5 References
Brooks, M. E., Kristensen, K., van Benthem, K. J., Magnusson, A., Berg, C. W., Nielsen, A., Skaug, H. J.,Machler, M., & Bolker, B. M. (2017). glmmTMB balances speed and flexibility among packages for zero-inflated generalized linear mixed modeling. R Journal, 9 (2), 378–400. https://doi.org/10.32614/RJ-2017-066
Dunn, P. O., Armenta, J. K., & Whittingham, L. A. (2015). Natural and sexual selection act on different axes of variation in avian plumage color. Science Advances, 1(2), e1400155. https://doi.org/10.1126/sciadv.1400155
Hadfield, J. D. (2024, May). MCMCglmm: MCMC Generalised Linear Mixed Models. Retrieved October 7, 2024, from https://cran.r-project.org/web/packages/MCMCglmm/index.html
Hill, G. E., & McGraw, K. J. (2006). Bird Coloration, Volume 2: Function and Evolution. Harvard University Press. https://doi.org/10.2307/j.ctv22jnr8k
Kristensen, K., & McGillycuddy, M. (2025). Covariance structures with glmmTMB. https://cran.r-project. org/web/packages/glmmTMB/vignettes/covstruct.html
McGillycuddy, M. (2023). Model-Based Assessment of Treatment Effect in Multivariate Abundance Data [Doctoral dissertation, The University of New South Wales].
Sanchez-Martinez, P., Martinez-Vilalta, J., Dexter, K. G., Segovia, R. A., & Mencuccini, M. (2020). Adaptation and coordinated evolution of plant hydraulic traits. Ecology Letters, 23 (11), 1599–1610. https://doi.org/10.1111/ele.13584
Trugman, A. T., Anderegg, L. D. L., Shaw, J. D., & Anderegg, W. R. L. (2020). Trait velocities reveal that mortality has driven widespread coordinated shifts in forest hydraulic trait composition. Proceedings of the National Academy of Sciences, 117(15), 8532–8538. https://doi.org/10.1073/pnas.1917521117
---title: "Supporting Information for:\n Phylogenetic generalised linear mixed-effects modelling with glmmTMB R package"author: "Coralie Williams, Maeve McGillycuddy, Szymek Drobniak, Ben Bolker, David I. Warton, Shinichi Nakagawa"include-in-header: text: | <link rel="shortcut icon" href="robin.png" /> <link rel="icon" type="image/x-icon" href="robin.png">format: html: toc: true toc-location: left toc-depth: 3 theme: default embed-resources: true code-fold: show code-tools: true number-sections: true number-depth: 2 footnote-location: margineditor_options: chunk_output_type: console---<!-- - make sure to use package name:: as much as possible --><!-- setwd("C:/Users/z5394590/OneDrive - UNSW/Documents/Projects/phylo_glmm_sim/docs") --><!-- page-layout: full --><!-- <img src="robin.png" alt="Robin icon" width="25"> --># Overview ```{r setup}#| output: false#| warning: false#| label: packages#| code-overflow: wrap#| code-fold: trueknitr::opts_chunk$set( warning = FALSE, message = FALSE)pacman::p_load( rmarkdown, readr, dplyr, purrr, splitstackshape, tidyverse, tidyr, ggplot2, vroom, details, sessioninfo, devtools, readxl, forcats, stringr, ape, phytools, geiger, data.table, pavo, MASS, glmmTMB, brms, MCMCglmm, phyr, INLA, broom.mixed, knitr, bayestestR, performance, rbenchmark, here, DHARMa, emmeans, purrr)options(digits = 4, scipen = 5)```This webpage provides supporting information for the manuscript “Phylogenetic generalised linear mixed-effects modelling with the glmmTMB R package”, where we showcase the [*propto*](https://cran.r-project.org/web/packages/glmmTMB/vignettes/covstruct.html#proportional) (proportional-to) covariance structure in the `glmmTMB` R package (McGillycuddy, 2023; Kristensen & McGillycuddy, 2025).This webpage is organised in three sections:1. First, we describe the simulated models and data-generating mechanisms for each of the five evaluated R packages for fitting PGLMMs (see [Models](#models)).2. We then use the simulated data to illustrate compilation and sampling runtimes for the Bayesian MCMC models (see [Runtimes](#runtimes)).3. Finally, we present two case studies that apply these methods to ecological and evolutionary data, the first on bird colour traits and the second on plant traits (see [Case studies](#case-studies)).### ContactFor questions or error reports, contact Coralie Williams: [coralie.williams@unsw.edu.au](mailto:coralie.williams@unsw.edu.au) or [coraliewilliams@outlook.com](mailto:coraliewilliams@outlook.com)Last updated on: `r format(Sys.time(), '%d %B %Y')`# Models {#models}We demonstrate and explain how to fit phylogenetic generalized mixed models (PGLMM) using five different packages which we used in our simulation study: `glmmTMB`, `phyr`, `MCMCglmm`, `brms`, and `INLA`. The models are fitted to one simulated dataset using a randomly generated phylogenetic tree and a set of model parameters. We compare the performance of these packages in terms of runtime, accuracy, and bias of the fixed effect mean and random effect variance estimates.We assume a set of trait measures $y_{ij}$ repeated measure (observation) $i$ and for each species $j$. The model is specified as follows:$$y_{ij} = \beta_0 + \beta_1 x_{ij} + n_j + p_j + \varepsilon_{ij}$$Where:- $y_{ij}$: trait response for measure (observation) $i$ in species $j$- $\beta_0, \beta_1$: fixed effects (intercept and slope)- $n_i \sim \mathcal{N}(0, \sigma^2_{\text{n}})$: species-level effect (non-phylogenetic)- $p_i \sim \mathcal{N}(0, \sigma^2_{\text{p}} \mathbf{A})$: species-level effect (phylogenetic), where $\mathbf{A}$ is the phylogenetic correlation matrix- $\varepsilon_{ij} \sim \mathcal{N}(0, \sigma^2_{e})$: residual error## Simulate dataFirst we specify the parameter values for one simulation run. We assume the following:```{r}### set up model parameters seed <-1b0 <-1# fixed effect interceptb1 <-1.5# fixed effect slopek.species <-30# number of speciesn.reps <-10# number of repeated measures per species (assuming a balanced design)sigma2.n <-0.25# variance of non-phylogenetic effectsigma2.p <-0.25# variance of phylogenetic effectsigma2.e <-0.2# residual error variance```Simulate data based on these parameter values:```{r}#| cache: true #| eval: true#| echo: trueset.seed(seed)# set up k.obs (number of observations per species)k.obs <- n.reps # assumes a balanced design# species id sp.id <-rep(seq_len(k.species), times=k.obs)# total number of observationsn <-length(sp.id)# simulate simple dataframe with covariate (x) variablex <-runif(n, 10, 20) dat <-data.frame(obs =1:n, x = x, species = sp.id)# simulate tree and obtain phylo matrixtree <- ape::rtree(k.species, tip.label =seq_len(k.species))tree <- ape::compute.brlen(tree, power=1) # power of 1 means ultra-metric treephylo.mat <- ape::vcv(tree, corr =TRUE) # we want a correlation matrix (bounded by -1 and 1) ultrametric treephylo.mat <- phylo.mat[order(as.numeric(rownames(phylo.mat))), order(as.numeric(rownames(phylo.mat)))]# Simulate response variable (phen) based on cofactor and phylogenetic matrixu.s <-rnorm(k.species, 0, sqrt(sigma2.n))[sp.id]u.p <- MASS::mvrnorm(1, mu=rep(0, k.species), Sigma=sigma2.p*phylo.mat)[sp.id]ei <-rnorm(n, 0, sqrt(sigma2.e))# get estimates of yyi <- b0 + b1*x + u.s + u.p + ei### append all to dataframedat <-cbind(dat, u.s, u.p, ei, b0, b1, yi)dat$phylo <- dat$species # phylo ID variable (same as species) - needs to be numeric to work with INLAdat$species <-factor(dat$species) # format species variable for modelsdat$sp <- dat$species # create sp variable (for phyr)dat$g <-1# add variable g constant (for glmmTMB)save(dat, tree, phylo.mat, file ="simulated_data.RData")```View first five rows of the simulated dataset:```{r}head(dat)```## Run models### Run model with `glmmTMB`To fit a PGLMM using glmmTMB package we specify the species-level random effect with phylogenetic relatedness using the `propto` covariance structure. In the first part we specify the random intercept (`0 + species`), followed by the grouping variable `g` (which we assume here as constant), and then the phylogenetic correlation matrix `phylo.mat`.The model output will provide the fixed effect estimates, random effect variance estimates, and the residual variance estimate. The random effect variance estimates will be on the standard deviation scale, so we square them to get the variance estimates.```{r}#| cache: true #| eval: true#| echo: true#| label: Simulated glmmTMB model# repeated measures per speciestime.glmmTMB <-system.time({ model_glmmTMB <-glmmTMB(yi ~ x + (1|species) +propto(0+ species|g, phylo.mat),data = dat,REML =TRUE)})```Check if the model converged (i.e. returns `TRUE` if the Hessian is positive definite, for more details read here: https://cran.r-project.org/web/packages/glmmTMB/vignettes/troubleshooting.html):```{r}model_glmmTMB$sdr$pdHess```We can then get the fixed effect estimates and their confidence intervals:```{r}coefs_tmb <-as.data.frame(confint(model_glmmTMB, parm="beta_"))coefs_tmb```The random effect component estimates are on the standard deviation scales fron the `confint` function, so we need to square them to get the variance estimates. We also compute the standard error and confidence intervals on the variance scale using the delta method.```{r}re_tmb <-as.data.frame(confint(model_glmmTMB, parm="theta_")) # on standard deviation scalere_tmb# Compute variance, SE (delta method), and CI on variance scale)var_est <- re_tmb$Estimate^2var_se <-2* re_tmb$Estimate * (re_tmb$`97.5 %`- re_tmb$`2.5 %`) / (2*1.96)var_ci_low <- re_tmb$`2.5 %`^2var_ci_high <- re_tmb$`97.5 %`^2# Residual varianceresid_var <-sigma(model_glmmTMB)^2# Combine resultssigma2_tmb <-data.frame(model ="glmmTMB",group =c("phylo", "species", "Residual"),term ="var",estimate =c(var_est, resid_var),std.error =c(var_se, NA),conf.low =c(var_ci_low, NA),conf.high =c(var_ci_high, NA))sigma2_tmb```### Run model with `phyr`To fit the model using `phyr` package we use the `cov_ranef` argument to specify the phylogenetic tree directly. The random effect term `(1|sp__)` indicates that we want to include both phylogenetic and non-phylogenetic random effects. The underscores `__` indicate that we want to include phylogenetic correlations in the model.```{r}#| cache: true #| label: Simulated phyr modeltime.phyr <-system.time({ model_phyr <-pglmm(yi ~ x + (1|sp__),cov_ranef =list(sp = tree),data = dat,REML =TRUE)})```Obtain fixed effect estimates from model:```{r}coefs_phyr <-as.data.frame(fixef(model_phyr))coefs_phyr$conf.low[2] <- coefs_phyr$Value[2] - coefs_phyr$Std.Error[2]*1.96coefs_phyr$conf.high[2] <- coefs_phyr$Value[2] + coefs_phyr$Std.Error[2]*1.96```Obtain the random effect and residual variance estimates. Note that `phyr` provides them on the standard deviation scale.```{r}# get phyr random effect variance estimates var_re_phyr <-c(as.numeric(model_phyr$ss[2])^2, #phylogenetic as.numeric(model_phyr$ss[1])^2, #non-phylogenetic as.numeric(model_phyr$ss[3])^2) #residual # combine into dataframesigma2_phyr <-data.frame(model ="phyr",group =c("phylo", "species", "Residual"),term ="var",estimate = var_re_phyr,std.error =NA,conf.low =NA, conf.high =NA)sigma2_phyr```### Run model with `MCMCglmm`To fit the MCMCglmm model we first we need to set up a precision matrix for the phylogenetic random effect, which (which is the inverse of the phylogenetic covariance matrix). We use the `inverseA` function the package to obtain the inverse of the phylogenetic covariance matrix.These priors specify weakly informative settings: both random effects ($G1$, $G2$) use inverse-Wishart style priors ($V=1$, $\nu=1$, with $\alpha.\mu=0$, $\alpha.V=1000$), while the residual variance ($R$) has an almost flat inverse-Gamma prior ($V=1$, $\nu=0.02$).The `ginverse` argument is used to specify the inverse of the phylogenetic covariance matrix. Then we specify the number of iterations, burn-in, and thinning parameters for the MCMC sampling. Here we increased the number of iterations by 30 times the default value to ensure convergence and stability of the MCMC chains, but this could be further increased. Usually you would want to set these to some reasonably large values after inspection of diagnostic plots (i.e. trace plots and posterior distributions).```{r}#| cache: true #| label: Simulated MCMCglmm model# get precision phylo matrix and order rowsphylo.prec.mat <- MCMCglmm::inverseA(tree, nodes ="TIPS", scale =TRUE)$Ainvphylo.prec.mat <- phylo.prec.mat[order(as.numeric(rownames(phylo.prec.mat))),order(as.numeric(rownames(phylo.prec.mat)))]# set priors for two random effectsprior <-list(G=list(G1=list(V=1,nu=1,alpha.mu=0,alpha.V=1000), G2=list(V=1,nu=1,alpha.mu=0,alpha.V=1000)),R=list(V=1,nu=0.02))# fit MCMCglmm modeltime.mcmc <-system.time({ model_mcmc <-MCMCglmm(yi ~ x,random =~species + phylo,family ="gaussian",ginverse =list(phylo = phylo.prec.mat),prior = prior,data = dat,verbose =FALSE,nitt =303000, # increase default by x30burnin =3000, # defaultthin =10) # default})```Plot the trace plots of the parameters to check for convergence. We expect to see good mixing of the chains and no trends.```{r}plot(model_mcmc$Sol, ask =FALSE)```Next we check the effective sample size (ESS). The ESS should be above 400 for both fixed and random effects and the Rhat value should be below 1.01 (Vehtari et al., 2021).```{r}min(effective_sample(model_mcmc, effects ="fixed")$ESS)min(effective_sample(model_mcmc, effects ="random")$ESS)```We can also check the Heidelberger diagnostic test which checks if the MCMC chains have converged and are stationary ( we expect to see 'passed' for all parameters). In practice it is important to inspect diagnostic plots of the Markov chains and posterior distributions.```{r}fullchain <-cbind(as.mcmc(model_mcmc$Sol), as.mcmc(model_mcmc$VCV))heidel.diag(fullchain)```Obtain the fixed effect estimates with the `tidy` function:```{r}coefs_mcmc <-as.data.frame(tidy(model_mcmc, effects="fixed", conf.int=TRUE))coefs_mcmc```Obtain the random effect and residual variance estimates:```{r}# get MCMCglmm random effect estimates (variance scale)sigma2_mcmc <-tidy(model_mcmc, effects="ran_pars", conf.int=TRUE)sigma2_mcmc <- sigma2_mcmc %>%mutate(model="MCMCglmm",group=str_replace(group,"animal", "phylo")) %>% dplyr::select(model, group, term, estimate, std.error, conf.low, conf.high)sigma2_mcmc```### Run model with `brms`To fit the model using the `brms` package we specify the random effects using the `(1|species)` term for non-phylogenetic random effects and `gr(phylo, cov = phylo.mat)` for phylogenetic random effects. The `data2` argument is used to pass the phylogenetic correlation matrix to the model (the same matrix format as for glmmTMB). We set the number of iterations, chains, and cores to reasonable values to ensure convergence and stability of the MCMC chains. Here we increased the number of iterations by 10 times the default value to ensure convergence and stability of the MCMC chains, but this could be further increased (and should be if ESS are low and Rhat values are higher than 1.01).```{r}#| cache: true #| label: Simulated brms modeltime.brms <-system.time({ model_brms <-brm(yi ~ x + (1|species) + (1|gr(phylo, cov = phylo.mat)), #phylo.mat is the correlation matrixdata = dat,family =gaussian(),chains =4, # defaultiter =20000, # increased default x10cores =4, # equal to number of chainsdata2 =list(phylo.mat = phylo.mat))})```Look at the trace plots of the parameters to check for convergence. We expect to see good mixing of the chains and no trends.```{r}plot(model_brms, N =2, ask =FALSE)```Check that the maximum effective sample size (ESS) of the model is high enough and check the Rhat value is below 1.01 (Vehtari et al. 2021).```{r}# check ESS min(bayestestR::effective_sample(model_brms, effects ="fixed")$ESS)min(bayestestR::effective_sample(model_brms, effects ="random")$ESS)# check RHat value is below 1.01 (Vehtari et al. 2021)max(rhat(model_brms))<1.01# alternatively, we can use the bayestestR package to check the diagnostics of fixed effectsprint(bayestestR::diagnostic_posterior(model_brms), digits =4)```We can also use the diagnostic test from the `bayestestR` package to check the convergence of the MCMC chains.```{r}print(bayestestR::diagnostic_posterior(model_brms),digits =4)```Obtain the fixed effect estimates with `tidy` function:```{r}coefs_brm <-as.data.frame(tidy(model_brms, effects="fixed", conf.int=TRUE))coefs_brm```Obtain the random effect and residual variance estimates:```{r}sigma_brms <-tidy(model_brms, effects="ran_pars")sigma2_brms <- sigma_brms %>%mutate(model="brms",term=str_replace(term, "sd", "var"),estimate=estimate^2) %>%##compute variance estimates dplyr::select(model, group, term, estimate, std.error, conf.low, conf.high)sigma_brms```### Run model with `INLA`For the `INLA` package we set up the model using the `f()` function to specify the random effects. The `model = "iid"` argument indicates that we want to use an independent and identically distributed (iid) random effect for species, while the `model = "generic0"` argument is used for the phylogenetic random effect. The `Cmatrix` argument is used to specify the phylogenetic correlation matrix, which should be a precision matrix similar to `MCMCglmm`. We note that priors can be incorporated for parameter using the argument `hyper=`.```{r}#| cache: true #| label: Simulated INLA modeltime.inla <-system.time({ model_inla <-inla(yi ~ x +f(species, model ="iid") +f(phylo, ## this needs to be a numeric to workmodel ="generic0",Cmatrix = phylo.prec.mat),family ="gaussian",data = dat)})fit_inla <-summary(model_inla)```Obtain the fixed effect estimates:```{r}coefs_inla <-as.data.frame(fit_inla$fixed)coefs_inla```Obtain the random effect variance estimates. Note that `INLA` provides the **inverse of the variance** (precision) in their output.```{r}re_inla <-1/fit_inla$hyperparsigma2_inla <-data.frame(model ="INLA",group =c( "Residual", "species", "phylo"),term ="var",estimate = re_inla$mean,std.error = fit_inla$hyperpar$sd,conf.low = re_inla$`0.025quant`,conf.high = re_inla$`0.975quant`)sigma2_inla```**Additional note:** if a model returns extreme or unreasonable variance estimates, it is important to check robustness by refitting the same model with another package. In INLA, centering the random effect estimates (by specifying `const = TRUE` within the random effect term) or trying higher starting values (default is 4, e.g. `hyper = list(prec=list(initial=8))`) to see if the estimates converge to more reasonable values. Random effect variance estimates are often unstable, so they should be interpreted with care.### Model estimatesSummary of model runtime, covariate $x$ estimate and it's confidence interval for each model:```{r}#| cache: true#| eval: true#| echo: true#| label: Simulated model fixed effect estimatestime.phyr <-as.numeric(time.phyr[3])time.glmmTMB <-as.numeric(time.glmmTMB[3])time.brms <-as.numeric(time.brms[3])time.mcmc <-as.numeric(time.mcmc[3])time.inla <-as.numeric(time.inla[3])# combine fixed effect results res_fixed <-data.frame(model =c("phyr", "glmmTMB", "brms", "MCMCglmm", "INLA"),species_size = k.species,sample_size = n,run_time =c(time.phyr, time.glmmTMB, time.brms, time.mcmc, time.inla),b0 =rep(dat$b0[1], 5),b1 =rep(dat$b1[1], 5),mu =c(coefs_phyr$Value[2], coefs_tmb$Estimate[2], coefs_brm$estimate[2], coefs_mcmc$estimate[2], coefs_inla$mean[2]),mu_ci_low =c(coefs_phyr$conf.low[2], coefs_tmb$`2.5 %`[2], coefs_brm$conf.low[2], coefs_mcmc$conf.low[2], coefs_inla$`0.025quant`[2]),mu_ci_high =c(coefs_phyr$conf.high[2], coefs_tmb$`97.5 %`[2], coefs_brm$conf.high[2], coefs_mcmc$conf.high[2], coefs_inla$`0.975quant`[2]),stringsAsFactors =FALSE)kable(res_fixed, caption ="Runtime and fixed effect estimates of the simulated model",col.names =c("Model", "Species size", "Sample size", "Run time (s)", "b0", "b1", "Estimate (b1)", "CI low (b1)", "CI high (b1)"),digits =3,format ="html")```Summary of variance component estimates for each model:```{r}#| cache: true#| eval: true#| echo: true#| label: Simulated model random effect estimates# combine results togethers2 <-as.data.frame(rbind(sigma2_phyr, sigma2_tmb, sigma2_brms, sigma2_mcmc, sigma2_inla))# get subsets for each groups2_phylo <- s2 %>%filter(group=="phylo")s2_sp <- s2 %>%filter(group=="species")s2_res <- s2 %>%filter(group=="Residual")res_rand <-data.frame(model =c("phyr", "glmmTMB", "brms", "MCMCglmm", "INLA"),species_size = k.species,sample_size = n,run_time =c(time.phyr, time.glmmTMB, time.brms, time.mcmc, time.inla),sigma2_phylo = s2_phylo$estimate,sigma2_species = s2_sp$estimate,sigma2_residual = s2_res$estimate,stringsAsFactors =FALSE)kable(res_rand, caption ="Runtime and random component variance estimates of the simulated model",col.names =c("Model", "Species size", "Sample size", "Run time (s)", "Phylo variance est.", "Non-phylo variance est.", "Residual variance est."),digits =3,format ="html")```## Extra analyses and notes::: panel-tabset# One measure per species> Sometimes we only have one measure per species. This may happen because of limited sampling or because repeated observations for a species have already been averaged. In such cases, it is common to work with a single value per species.Here we will demonstrate how to fit a phylogenetic model when there is one measure per species. We will use the dataset `dat` of repeated measures per species simulated above (300 observations of 30 species) and compute a mean value for each species. ```{r}dat.m <- dat |>group_by(species) |>summarise(x_mean =mean(x, na.rm =TRUE),u.s_mean =mean(u.s, na.rm =TRUE),u.p_mean =mean(u.p, na.rm =TRUE),ei_mean =mean(ei, na.rm =TRUE),yi_mean =mean(yi, na.rm =TRUE),g =1,.groups ="drop" )head(dat.m)```So we now have a reduced dataset of 30 rows, with one value per species. If we now try to fit the same phylogenetic GLMM as before, including both a random intercept for species and a phylogenetic effect:```{r}m <-glmmTMB( yi_mean ~ x_mean + (1|species) +propto(0+ species|g, phylo.mat),data = dat.m,REML =TRUE)# fixed effect estimatesas.data.frame(confint(m, parm="beta_"))# random component estimates (on SD scale)as.data.frame(confint(m, parm="theta_"))# residual variance estimatesigma(m)^2```This model converged, but it returned very wide uncertainty intervals for the non-phylogenetic variance component (on the standard deviation scale). This reflects the fact that, with only one observation per species, the species intercept and the phylogenetic effect are not separately identifiable by the model. In some cases, the model may even fail to converge, highlighting the redundancy of including both terms.```{r}m <-glmmTMB( yi_mean ~ x_mean +propto(0+ species|g, phylo.mat),data = dat.m,REML =TRUE)# fixed effect estimatesas.data.frame(confint(m, parm="beta_"))# random component estimates (on SD scale)as.data.frame(confint(m, parm="theta_"))# residual variance estimatesigma(m)^2```When the non-phylogenetic component is removed, the estimate of the phylogenetic effect remains unchanged, the fixed effects are identical, and the residual variance increases to absorb the variability that can no longer be partitioned between the two random effects.# Random slope> In some analyses, we may want to include a random slope for the phylogenetic effect. This allows us to ask whether the relationship between a covariate and the response varies among species in a way that reflects their shared evolutionary history. For example, we might be interested in whether species with different body weights show different slopes, or whether the effect of a treatment differs across species in a phylogenetically structured way.Given the above simulated dataset, we assume a set of trait measures $y_{ij}$ repeated measure (observation) $i$ and for each species $j$. The phylogenetic random slope model assuming independence is specified as follows:$$y_{ij} = \beta_0 + \beta_1 x_{ij} + n_j + p_{0j} + p_{1j}\,x_{ij} + \varepsilon_{ij}$$Where:- $p_{0} \sim \mathcal{N}(0, \sigma^2_{p0}\mathbf{A})$: phylogenetic random intercept - $p_{1} \sim \mathcal{N}(0, \sigma^2_{p1}\mathbf{A})$: phylogenetic random slope, independent of $p_{0}$```{r}m.slope <-glmmTMB(yi ~ x + (1|species) +propto(0+ species|g, phylo.mat) +propto(0+ x:species|g, phylo.mat),data = dat,REML =TRUE)```Then we can obtain the variance estimates:```{r}vc.slope <-VarCorr(m.slope) vc.slope$cond$species[1] #non-phylo variancevc.slope$cond$g[1] #phylo intercept variancevc.slope$cond$g.1[1] #phylo slope variance```### What if there is a correlation between random intercept and slope? `glmmTMB` cannot currently estimate correlated random intercepts and slopes when using the `propto` covariance structure. This means the intercept–slope covariance is fixed at zero and cannot be separated out. A common workaround is to **centre the covariate**, which reparameterises the model so that the intercept represents the expected response at the mean covariate value. This improves interpretability and typically stabilises slope variance estimates, though it does not recover any true intercept–slope correlation. If the data contain little or no correlation, the variance estimates from `glmmTMB` with centred covariates will usually be close to those from packages that estimate the correlation. However, if the correlation is strong, `glmmTMB` may distort the partitioning of variance because the shared variation has no explicit covariance term. For comparison, the random-effects structure in a model with correlated random intercepts and slopes is: $$\begin{bmatrix}\sigma^2_{\text{intercept}} & \sigma_{\text{intercept},\text{slope}} \\\sigma_{\text{intercept},\text{slope}} & \sigma^2_{\text{slope}}\end{bmatrix}$$ while in `glmmTMB` with `propto` it is constrained to: $$\begin{bmatrix}\sigma^2_{\text{intercept}} & 0 \\0 & \sigma^2_{\text{slope}}\end{bmatrix}$$ **Recommendation:** fit models in both `glmmTMB` and in Bayesian frameworks such as the `brms` and `MCMCglmm` packages in R. Use globally centred covariates in both cases: centring helps stabilise estimation in `glmmTMB` and also improves interpretability in Bayesian models, though Bayesian approaches can additionally estimate the correlation directly. - If there is **truly little or no correlation**, the results will look very similar to `glmmTMB`: the variance estimates will be close, and the correlation will be near zero. - If there is **strong correlation**, the slope and intercept variance estimates in `glmmTMB` may still approximate the overall variability, but they could be biased, because part of the shared variation is forced into the variance components rather than partitioned as covariance.# Phylogenetic tree uncertainty> This section is still a work in progress.> Phylogenetic trees are always approximations and never represent the “true” evolutionary history with complete accuracy. In some cases, however, we may have access to multiple candidate trees. These can be used to fit separate models and then combine the results, for example using Rubin’s rules for multiple imputation.Here, we illustrate this approach with the simulated repeated-measures species dataset. As a first step, we generate 50 random phylogenetic trees to represent a set of hypothetical candidate trees:```{r}set.seed(123)# simulate 50 random trees trees <-replicate(50, ape::rtree(k.species, tip.label = sp.id), simplify =FALSE)# now make a phylogenetic correlation matrix for each tree,make_phylo_mat <-function(tree, sp) { C <-vcv(tree, corr =TRUE) ord <-match(sp.id, rownames(C)) C[ord, ord, drop =FALSE]}phylo_mats <-lapply(trees, make_phylo_mat, sp = sp.id)```We fit the same model across 50 candidate trees and pool fixed effects using Rubin’s rules (average within-tree variance plus between-tree variance):```{r}#| eval: false#| echo: falsefit_one <-function(Cmat) { mod <-glmmTMB(yi ~ x + (1|species) +propto(0+ species|g, Cmat),data = dat, REML =TRUE) beta <-fixef(mod)$cond V <-diag(vcov(mod)$cond) reci <-confint(mod, parm ="theta_") list(beta = beta, var_w = V,re_sd = reci[,"Estimate"],resid_var =sigma(mod)^2)}# Fit across all treesfits <- purrr::map(phylo_mats, fit_one)m <-length(fits)```For the random-effect SDs and the residual variance across trees [....]```{r}#| eval: false#| echo: false# --- Rubin's rules for FIXED EFFECTS ---# Stack estimates and within-variances into matrices (rows = trees, cols = terms)terms <-names(fits[[1]]$beta)est_mat <-do.call(rbind, map(fits, ~ .x$beta[terms]))var_mat <-do.call(rbind, map(fits, ~ .x$var_w[terms]))qbar <-colMeans(est_mat)Ubar <-colMeans(var_mat)B <-apply(est_mat, 2, var)Tvar <- Ubar + (1+1/m) * Bse <-sqrt(Tvar)ci_lo <- qbar -1.96* seci_hi <- qbar +1.96* serubin_fixed <-data.frame(term = terms, estimate = qbar, se = se,conf.low = ci_lo, conf.high = ci_hi,row.names =NULL)rubin_fixed``````{r}#| eval: false#| echo: false# --- Random-effect SDs -- CHECK THIS# Collect each theta SD by name across treesre_names <-names(fits[[1]]$re_sd)re_mat <-sapply(fits, function(x) x$re_sd[re_names]) # cols = treesre_mean <-rowMeans(re_mat)re_median <-apply(re_mat, 1, median)re_q25 <-apply(re_mat, 1, quantile, probs =0.25)re_q75 <-apply(re_mat, 1, quantile, probs =0.75)re_summary <-data.frame(theta = re_names,mean_sd = re_mean,median_sd = re_median,q25 = re_q25, q75 = re_q75,row.names =NULL)# --- Residual variance -- CHECK THISresids <-sapply(fits, `[[`, "resid_var")resid_summary <-data.frame(mean_resid_var =mean(resids),median_resid_var =median(resids),q25 =quantile(resids, 0.25),q75 =quantile(resids, 0.75))# summariesre_summaryresid_summary```:::<!-- Extra things I want to add to webpage: --><!-- - Incorporating phylogenetic tree uncertainty using rubin's rule (with several candidate trees) --># Bayesian MCMC model runtime {#runtimes}Here we provide a breakdown of the compilation and sampling runtimes for the Bayesian MCMC models using the above simulated repeated measures dataset. The runtimes are provided for each model fitted using the `MCMCglmm` and `brms` packages. Note: the compilation time is the time taken to compile the model, while the sampling time is the time taken to sample from the posterior distribution of the model parameters.First we set up the MCMCglmm model.### MCMCglmmSet up MCMCglmm tuning parameters - usually set to some reasonably large values through trial and error.```{r}#| eval: true#| echo: trueNITT <-100000BURNIN <-floor(0* NITT)THIN <-floor((NITT - BURNIN) /1500)```Run a model that prepares the run but does not start actual sampling to have the baseline timing of the model pre-run protocols.```{r}#| eval: true#| echo: trueruntime_1_pre <-benchmark("process"= { model_mcmcglmm_pre <-MCMCglmm(yi ~ x,random =~species + phylo,family ="gaussian",ginverse =list(phylo = phylo.prec.mat),prior = prior,data = dat,verbose =FALSE,nitt =1, burnin =0, thin =1 ) }, replications =1)```Run full reasonable model (like above). Extract MCMC trace.```{r}#| cache: true #| eval: true#| echo: trueTHIN <-1model_mcmcglmm_max <-MCMCglmm(yi ~ x,random =~species + phylo,family ="gaussian",ginverse =list(phylo = phylo.prec.mat),prior = prior,data = dat,verbose =FALSE,nitt = NITT, burnin = BURNIN, thin = THIN)summary(model_mcmcglmm_max)par_mcmcglmm_max <- model_mcmcglmm_max$VCV[, "species"]plot(par_mcmcglmm_max)# geweke.plot(par_mcmcglmm_max)# geweke.diag(par_mcmcglmm_max)# heidel.diag(par_mcmcglmm_max)test <-raftery.diag(par_mcmcglmm_max)# update your run parametersNITT <- test$resmatrix[,"N"] + test$resmatrix[,"M"]THIN <-ceiling(test$resmatrix[,"N"] / test$resmatrix[,"Nmin"])BURNIN <- test$resmatrix[,"M"]runtime_1_run <-benchmark("process"= { model_mcmcglmm_1_run <-MCMCglmm(yi ~ x,random =~species + phylo,family ="gaussian",ginverse =list(phylo = phylo.prec.mat),prior = prior,data = dat,verbose =FALSE,nitt = NITT, burnin = BURNIN, thin = THIN ) }, replications =1)```### BRMS"Reasonable" `brms` parameters.```{r}NITTb <-10000BURNINb <-1000THINb <-1CHAINS <-1```Run a model that prepares the run but does not start actual sampling to have the baseline timing of the model pre-run protocols.```{r}#| cache: true #| results: "hide"runtime_2_pre <-benchmark("process"= { model_brms_1_pre <-brm(yi ~ x + (1|species) + (1|gr(phylo, cov = phylo.mat)),data = dat,data2 =list(phylo.mat = phylo.mat),chains = CHAINS,iter =2,warmup =1,thin =1 ) }, replications =1)```Run "larger" model and convert posterior to MCMC object.```{r}#| results: "hide"#| cache: true model_brms_max <-brm(yi ~ x + (1|species) + (1|gr(phylo, cov = phylo.mat)),data = dat,data2 =list(phylo.mat = phylo.mat),chains = CHAINS,iter = NITTb,warmup = BURNINb,thin = THINb)summary(model_brms_max)par_brms_max <-as.mcmc(model_brms_max, pars ="sd_species__Intercept")[[1]]plot(par_brms_max)test <-raftery.diag(par_brms_max)# update your run parametersNITTb <- test$resmatrix[,"N"] + test$resmatrix[,"M"]THINb <-ceiling(test$resmatrix[,"N"] / test$resmatrix[,"Nmin"])BURNINb <- test$resmatrix[,"M"]runtime_2_run <-benchmark("process"= { model_brms_1_run <-brm(yi ~ x + (1|species) + (1|gr(phylo, cov = phylo.mat)),data = dat,data2 =list(phylo.mat = phylo.mat),chains = CHAINS,iter = NITTb,warmup = BURNINb,thin = THINb ) }, replications =1)```### Summary runtimesSummary table of runtimes for compilation and sampling:```{r}compil.perc.1<-round(runtime_1_pre$elapsed / (runtime_1_pre$elapsed+runtime_1_run$elapsed) *100, 1)compil.perc.2<-round(runtime_2_pre$elapsed / (runtime_2_pre$elapsed+runtime_2_run$elapsed) *100, 1)sampl.perc.1<-round(runtime_1_run$elapsed / (runtime_1_pre$elapsed+runtime_1_run$elapsed) *100, 1)sampl.perc.2<-round(runtime_2_run$elapsed / (runtime_2_pre$elapsed+runtime_2_run$elapsed) *100, 1)time.summary <-data.frame(model =c("MCMCglmm", "brms"),compilation.time =c(runtime_1_pre$elapsed, runtime_2_pre$elapsed),perc =c(compil.perc.1, compil.perc.2), sampling.time =c(runtime_1_run$elapsed - runtime_1_pre$elapsed, runtime_2_run$elapsed - runtime_2_pre$elapsed),perc =c(sampl.perc.1, sampl.perc.2))kable(time.summary)```# Case studies {#case-studies}## BackgroundThe following case studies illustrate how phylogenetic mixed models can be applied to diverse questions in ecology and evolution. The first examines patterns of plumage colour in birds, while the second explores macroevolutionary patterns in plant traits. Together, they demonstrate the flexibility of these methods across taxa and research questions.All models, results, and datasets presented in these case studies are **intended for illustrative purposes only**. They are designed to demonstrate analytical approaches rather than to provide definitive biological insights. No substantive conclusions should be drawn from these analyses.## Case study 1: Bird color evolutionBirds are a valuable system for studying ecological and evolutionary processes because their diversity in form, behaviour, and coloration is well documented across many species and habitats. Variation in plumage colour, in particular, can provide insights into mechanisms such as sexual selection, camouflage, and signalling. By examining these traits, researchers can better understand how environmental and evolutionary pressures shape biodiversity.[credits: Andy Chilton, Boris Smokrovic, David Clode, Jaime Spaniol, Joshua J. Cotte, Vincent Van Zalinge / unsplash.com.]{style="color:grey; font-size:0.9em;"}### Data overviewLoad data bird spectrum data:```{r}#| eval: true#| echo: true#| cache: true#| warnings: false#| label: Case study 1- Load data# Load bird spectrum data bird_data <-read_delim("data/Spec_IndivReg_Coralie.csv", delim =";", escape_double =FALSE, locale =locale(decimal_mark =","), trim_ws =TRUE)# summary of birds species and genuslength(table(bird_data$genus_original)) # 446 genuslength(table(bird_data$sci_name_Jetz)) # 949 species#length(table(bird_data$sci_name_original)) # 952 original species# Remove the specified species from bird_databird_data <- bird_data |>filter(!sci_name_Jetz %in%c("Basileuterus_rufifrons", "Malurus_lamberti", "Malurus_splendens"))########## Notes ########## missing values ---> not sure why?#na_count <- colSums(is.na(dat2))#na_count[na_count > 0]# individuals a-c are male and d-f are female#table(bird_data$sex, bird_data$individual)# number of measurements?#table(bird_data$Nmeasured)# duplicates per individual per body region#dup <- dat |># summarise(n = n(), # .by = c(wl, individual_nonrep, sex, body_region))#r <- dup[which(dup$n>1),]#table(r$individual_nonrep)```**Some notes:**- The dataset contains data for 949 bird species, with 446 unique genera.- Spectral values are normalised reflectance data.- `Nmeasured` is the number of measurements per body patch.- Each body region measured 5 times (then averaged).- These species seem to have two measurements per body region per individual, which we remove to facilitate the analysis.:``` Basileuterus_rufifrons, Malurus_lamberti, Malurus_splendens ```### Color data set-up#### Obtain wavelength datasetNow we want to create a wavelength dataset using the `pavo` package given the spectral reflectance data.```{r}#| eval: true#| echo: true#| cache: true#| label: Case study 1- Obtain wavelength dataset# set up wavelength dataset (wavelengths columns 300 to 474)dat <- bird_data |>pivot_longer(cols =`300`:`700`, names_to="wl", values_to="refl") |> dplyr::select(individual_nonrep, body_region, sex, wl, refl) |>mutate(wl =as.numeric(wl))# pivot so each column is an individual body region measurementdat2 <- dat |>pivot_wider(names_from =c("individual_nonrep", "sex", "body_region"),values_from ="refl", names_sep =".")# create spectral dataset with pavospecs <-as.rspec(dat2)# some examples of individual birds body region color spectrumplot(Acrocephalus_palustris_a.Male.throat ~ wl, type="l", data=specs, title="Acrocephalus palustris (throat)")plot(Alle_alle_a.Male.throat ~ wl, type="l", data=specs, title="Alle alle (throat)")plot(Aplonis_metallica_a.Male.wing_cov ~ wl, type ="l", data = specs, title="Aplonis metallica (wing covert)")# use procspec to adjust negative values by shifting them by 10# (this maybe due to darker colors)specs <-procspec(specs, fixneg="addmin")plot(specs, select =10)```#### Obtain spectral shape descriptorsUsing the wavelength dataset we can now obtain spectral shape descriptors. These descriptors will be used in the model. Here we will focus on four descriptors of interest: brightness (B1), spectral slope (S1), spectral curvature (S9), and hue (H4). More information about the spectral shape descriptors can be found here: https://book.colrverse.com/spectral-shape-descriptors.html```{r}#| eval: true# Obtain spectral shape descriptorsspec.des <-summary(specs, subset =c("B1", "B2", "S9", "H4")) ## using subset makes it faster to run by selecting the shapes of interestdev.off()# distribution of B1ggplot(spec.des, aes(x = B1)) +geom_histogram(bins =30, colour ="black", fill ="grey") +labs(x ="B1", title ="B1: Total brightness") +theme_bw()# distribution of S9ggplot(spec.des, aes(x = S9)) +geom_histogram(bins =30, colour ="black", fill ="grey") +labs(x ="S9", title ="S9: Carotenoid chroma") +theme_bw()# vdistribution o fH4ggplot(spec.des, aes(x = H4)) +geom_histogram(bins =30, colour ="black", fill ="grey") +labs(x ="H4", title ="H4: Hue (segment classification)") +theme_bw()```The first dataset is the shape descriptors dataset, which we will use to model the continuous trait (brightness).```{r}#| eval: true#| echo: truespec.des$rowname <-rownames(spec.des)spec.dat <- spec.des |>mutate(sex =case_when(grepl("Male", rowname) ~"male",grepl("Female", rowname) ~"female"),species =gsub("\\_[a-f]\\..*$", "", rowname),body_region =str_extract(rowname, "[a-z]+$") ) write.csv(spec.dat, file="data/spec_data_for_model.csv")```#### Obtain carotenoid datasets: binary and ordinal traitsWe will use the carotenoid dataset to model the presence/absence of carotenoid coloration in birds. This dataset is based on the spectral reflectance data and the spectral shape descriptors.Let's set up the dataset for modelling a binary trait (absence/presence of carotenoid color):```{r}#| cache: true# 1) get human visual model (CIE 10 degree observed under D65 "day light")vm <-vismodel(specs, visual ="cie10", illum ="D65", bkg ="ideal", relative =FALSE)# 2) convert to CIELAB colorspace (to get hue and chroma)lab <-colspace(vm, space ="cielab") # 3) get chroma and hue angle (converting CIELAB to CIELCh)L <- lab$La <- lab$ab <- lab$bC <-sqrt(a^2+ b^2)h <- (atan2(b, a) *180/ pi) %%360# degrees in 0 to 360# 4) thresholds for carotenoid range and saturationh_lower <-330# start of redh_upper <-100# end of yellowC_min <-15# min chromaL_min <-20# avoid very dark samples# 5) Hue range test in_range <- h >= h_lower | h <= h_upper# 6) COmpute binary trait: 1 = carotenoid like colour present, 0 = absentpresent <-as.integer(in_range & C >= C_min & L >= L_min)# set up datasetcarot.dat.all <-data.frame(rowname =rownames(lab),L = L, a = a, b = b, C = C, h = h,carotenoid = present)# merge with spec.data based on "rowname" columncarot.dat.all <-merge(spec.dat, carot.dat.all, by.x ="rowname", by.y ="rowname", all =TRUE)# create individual id variablecarot.dat.all$indiv_rep <-sub(".*_([a-f])\\..*$", "\\1", carot.dat.all$rowname)# number of observations with carotenoid like colourtable(carot.dat.all$carotenoid)# # checks# carot.dat.all |># filter(L>20 & C>12 & in_range) |># arrange(desc(L))# # check bird cardinalis cardinalis (should be yellow or red?)# carot.dat.all |> # filter(grepl("Cardinalis_cardinalis", rowname)) |> # arrange(desc(h))# Plotggplot(carot.dat.all, aes(h, C)) +geom_point(data =subset(carot.dat.all, carotenoid ==0),shape =21, size =1.5, fill ="grey85", colour ="grey70", alpha =0.6) +geom_point(data =subset(carot.dat.all, carotenoid ==1),aes(fill = L, colour = L), shape =21, size =1.8, stroke =0.4, alpha =0.95) +scale_fill_gradient(name ="L*", limits =c(15, 50), low ="darkred", high ="lightcoral") +scale_colour_gradient(guide ="none", limits =c(15, 50), low ="darkred", high ="lightcoral") +labs(x ="Hue (degrees)", y ="Chroma (C*)", title ="Carotenoid presence") +theme_bw()```Let's save the dataset for modelling later on:```{r}carot.dat <- carot.dat.all# save as csvwrite.csv(carot.dat, file="data/carotenoid_data_for_model.csv", row.names =FALSE)```Now let's obtain the dataset where we summarise for each individual bird the proportion of body region with carotenoid color presence (ordinal data trait), and save it for modelling later on:```{r}carot.ordinal <- carot.dat.all |>group_by(species, sex, indiv_rep) |>summarise(prop_carotenoid =mean(carotenoid), .groups ="drop")# save as csvwrite.csv(carot.ordinal, file="data/ordinal_data_for_model.csv", row.names =FALSE)```### Phylogenetic correlation matrix set-upLoad and view phylogenetic tree:```{r}#| output: true#| warnings: true#| eval: true#| label: Case study 1- Phylogenetic tree#| # Load bird tree (consensus tree = "combined tree")bird.tree <-read.tree("data/Stage2_Hackett_MCC_no_neg.tre")### Prune bird treebird.pruned <-keep.tip(bird.tree, bird_data$sci_name_Jetz)# check whether names match in data and treecheck <-name.check(bird.pruned, bird_data$sci_name_Jetz, sort(bird.pruned$tip.label))# plot treeplotTree(bird.pruned, ftype="i", fsize=0.4, lwd=1, type="fan")dev.off()```Set up correlation matrix for glmmTMB model and check it corresponds to the species labels in the data:```{r}#| eval: true#| echo: true# set up phylogenetic correlation matrixphylo.mat <-vcv(bird.pruned, corr =TRUE) phylo.mat <- phylo.mat[sort(rownames(phylo.mat)), sort(rownames(phylo.mat))]saveRDS(phylo.mat, file ="data/phylo_matrix.rds")# checks # length(colnames(phylo.mat))==length(table(spec.dat$species))# all(head(rownames(phylo.mat))==head(colnames(phylo.mat)))# head(table(spec.dat$species))```### Model 1. Continuous trait```{r}#| eval: true# load dataspec.dat <-read_csv("data/spec_data_for_model.csv")phylo.mat <-readRDS("data/phylo_matrix.rds")# load library library(glmmTMB)# add grouping variable (set it to 1) - this is necessary to fit the glmmTMB modelspec.dat$g <-1```##### Modelling total brightness (B1)The brightness trait is right-skewed (as shown above), which is consistent with multiplicative evolutionary change. To identify an appropriate sampling distribution, we will fit four models with an identical linear predictor and random effects structure:1. Gaussian to model $\log(B1)$\2. Gamma with a log link to model $B1$For each model we will examine simulated standardised residuals using \texttt{DHARMa} to assess nonlinearity and heteroscedasticity. If two or more models perform similarly, we will prefer the model with clearer interpretation on the original scale.<!-- # Little checks --------- --><!-- library(fitdistrplus) --><!-- descdist(spec.dat$B1, discrete=FALSE) --><!-- gamma_dist <- fitdist(spec.dat$B1, "gamma") --><!-- plot(gamma_dist) --><!-- norm_dist <- fitdist(log(spec.dat$B1), "norm") --><!-- plot(norm_dist) --><!-- lnorm_dist <- fitdist(spec.dat$B1, "lnorm") --><!-- plot(lnorm_dist) -->```{r}#| cache: true # Fit models --------------# normaltime_norm <-system.time( m1 <-glmmTMB(log(B1) ~ body_region * sex + (1|species) +propto(0+ species|g,phylo.mat),family =gaussian(),data = spec.dat) )# Gamma distributiontime_gamma <-system.time( m2 <-glmmTMB(B1 ~ body_region * sex + (1|species) +propto(0+ species|g,phylo.mat),family =Gamma(link ="log"),data = spec.dat))# Get model info ---------------------------# check whether model has postive definite hessianb1_output <-data.frame(model =c("Gaussian (log)", "Gamma"),convergence =c(m1$sdr$pdHess, m2$sdr$pdHess),runtime =c(time_norm[["elapsed"]], time_gamma[["elapsed"]]))b1_output```<!-- # Lognormal not working get an error that the link function is not supported? --><!-- t_lnorm <- system.time( --><!-- modb1_lnorm <- glmmTMB(B1 ~ body_region * sex + (1|species) + propto(0 + species|g,phylo.mat), family = lognormal(link = "log"), data = spec.dat) --><!-- ) -->```{r}#| eval: false#| echo: falsetable(spec.dat$sex)spec.dat$sex.c <-ifelse(spec.dat$sex=="female", -0.5, 0.5)table(spec.dat$sex.c)# try out fitting random slopem <-glmmTMB(log(B1) ~ sex + (1|species) +propto(0+ sex:species|g,phylo.mat) +propto(0+ species|g,phylo.mat),family =gaussian(),data = spec.dat) m_re <-as.data.frame(confint(m, parm="theta_"))```##### Model diagnosticsLet's check residual plots with the DHARMa packages. First, the log(B1) assuming normal distribution residual checks:```{r}#| cache: true res_gauss <- DHARMa::simulateResiduals(fittedModel = m1)plot(res_gauss)```The model checks assuming Gamma distribution and log link function:```{r}#| cache: true res_gamma <- DHARMa::simulateResiduals(fittedModel = m2)plot(res_gamma)```We found that the Gamma distribution shows improve model fit in the residual plots.##### Model estimatesLet's obtain the estimate of the phylogenetic signal and non-phylogenetic signal for the Gamma model:```{r}#| output: true#| cache: true # get random effect component estimates on SD-scale m2_re <-as.data.frame(confint(m2, parm="theta_"))sigma2_s <- m2_re$Estimate[1]^2# non-phylogenetic variancesigma2_p <- m2_re$Estimate[2]^2# phylogenetic variance estimatep.signal <- sigma2_p / (sigma2_p + sigma2_s)p.signalnp.signal <-1-p.signalnp.signal```On the log mean scale of the Gamma model, the phylogenetic species effect explained 84.2% of the total species level random effect variance i.e. this represents the degree of phylogenetic signal in the overall variance sourced from species. The non-phylogenetic effect explained 15.8% of the total species level random effect variance.We can obtain and compare the residual variances for each model with the following:```{r}# residual variance of gaussian modelm1_res.var <-sigma(m1)^2# residual variance of gamma modelphi <-sigma(m2)^2#phi given on SD-scalescale <-1/phi #derive scale parameterm2_res.var <-trigamma(scale)# printdata.frame(model =c("Gaussian (log)", "Gamma"),residual_variance =c(m1_res.var, m2_res.var))```To assess differences in total brightness between sexes for each body region, we compute marginal means from the fitted model and perform pairwise comparisons between females and males. The resulting ratios (female/male) and their confidence intervals are then plotted to visualise the magnitude and direction of differences across body regions.```{r}#| output: true#| warnings: true#| eval: trueemm_b1 <-emmeans(m2, ~ sex | body_region, type ="response")# get contrasts (ratios) back transformed on response scale with CIb1_sex_diff <-contrast(emm_b1, method ="pairwise", reverse=TRUE) |>summary(infer =TRUE, type ="response")b1_sex_diff## plot pairwise differences between female and male total brightnessb1_sex_diff |>arrange(ratio) |>mutate(body_region =factor(body_region, unique(body_region), ordered =TRUE)) |>ggplot(aes(y = body_region, x = ratio)) +geom_point(size =3, colour ="#7B5EA9") +geom_errorbar(aes(xmin = asymp.LCL, xmax = asymp.UCL), width =0.2, colour ="#7B5EA9") +geom_vline(xintercept =1, linetype ="dashed", colour ="grey40") +labs(y ="Body Region", x ="Female / Male Ratio", title ="Brightness") +theme_bw() +theme(legend.position ="none")```Ratios $>1$ indicate greater brightness in females, while ratios $<1$ indicate greater brightness in males.### Model 2. Binary traitBinary colour traits, such as the presence of carotenoid colours, may be associated with distinct evolutionary and ecological drivers.Here we want to model the absence or presence of carotenoid across all body regions accounting for sex and including species- and phylogeny-level random effects. We will use the carotenoid dataset which we obtained earlier.##### Modelling carotenoid presenceWe will fit two models: - Binomial model with a logit link function - Beta binomial model with a logit link function```{r}#| cache: true# Fit models --------------# load datcarot.dat <-read_csv("data/carotenoid_data_for_model.csv")carot.dat$g <-1# binomial modeltime_binom <-system.time( m3 <-glmmTMB(carotenoid ~ body_region * sex + (1|species) +propto(0+ species|g,phylo.mat),family =binomial(link ="logit"),data = carot.dat))# beta binomialtime_bbinom <-system.time( m4 <-glmmTMB(carotenoid ~ body_region * sex + (1|species) +propto(0+ species|g,phylo.mat),family =betabinomial(link ="logit"),data = carot.dat))# Get model info -----------carotmod_output <-data.frame(model =c("Binomial", "Beta binomial"),convergence =c(m3$sdr$pdHess, m4$sdr$pdHess),runtime =c(time_binom[["elapsed"]], time_bbinom[["elapsed"]]),AIC =c(AIC(m3), AIC(m4)))carotmod_output ```<!-- Test out model --><!-- ```{r} --><!-- carot.dat.bis <- carot.dat[match(unique(carot.dat$species), carot.dat$species),] --><!-- carot.dat.bis$g <- 1 --><!-- m3.bis <- glmmTMB(B1 ~ 1 + (1|species) + propto(0 + species|g,phylo.mat), --><!-- #family = binomial(link = "logit"), --><!-- data = carot.dat.bis) --><!-- m3.bis$sdr$pdHess --><!-- VarCorr(m3.bis) --><!-- confint(m3.bis) --><!-- m3.c <- glmmTMB(carotenoid ~ 1 + (1|species) + propto(0 + species|g,phylo.mat), --><!-- family = binomial(link = "logit"), --><!-- data = carot.dat) --><!-- VarCorr(m3.c) --><!-- confint(m3.c) --><!-- ``` -->##### Model diagnosticsLet's check residual plots with the DHARMa packages. First, let's get the residual plots for the binomial model:```{r}#| cache: true res_binom <- DHARMa::simulateResiduals(fittedModel = m3)plot(res_binom)```Now the model checks assuming zero inflated binomial distribution and log link function:```{r}#| cache: true res_bbinom <- DHARMa::simulateResiduals(fittedModel = m4)plot(res_bbinom)```We found that the binomial model has improved model fit in AIC and in the residual plots.##### Model estimatesLet's obtain the estimate of the phylogenetic signal for this model:```{r}#| cache: true # get random effect component estimates on SD-scalem3_re <-as.data.frame(confint(m3, parm="theta_"))sigma2_s <- m3_re$Estimate[1]^2# non-phylogenetic variancesigma2_p <- m3_re$Estimate[2]^2# phylogenetic variance estimatep.signal <- sigma2_p / (sigma2_p + sigma2_s)p.signal```Get model estimated marginals means on the response scale:```{r}# Estimated marginal means on the response scale (odds)emm_m3 <-emmeans(m3, ~ sex | body_region, type ="response")# get contrasts (odds ratios)m3_sex_prob <-contrast(emm_m3, method ="pairwise", reverse=TRUE) |>summary(infer =TRUE, type ="response")# Put into data frame for plottingsex_prob_df <-as.data.frame(m3_sex_prob) |>rename(ratio = odds.ratio, lower.CL = asymp.LCL, upper.CL = asymp.UCL)sex_prob_df # Plotsex_prob_df |>arrange(ratio) |>filter(!(body_region=="back")) |>mutate(body_region =factor(body_region, unique(body_region), ordered =TRUE)) |>ggplot(aes(y = body_region, x = ratio, colour = body_region)) +geom_point(size =3, colour ="#4B6EA5") +geom_errorbar(aes(xmin = lower.CL, xmax = upper.CL), width =0.2, colour ="#4B6EA5") +geom_vline(xintercept =1, linetype ="dashed", colour ="grey40") +labs(y ="Body Region", x ="Female / Male Odds Ratio", title ="Sex differences in carotenoid presence") +theme_bw() +theme(legend.position ="none")```### Model 3. Ordinal traitThe ordinal trait is the percentage of body region with carotenoid presence. We will fit a beta-binomial and model to this trait, which is suitable for modeling proportions.##### Modelling carotenoid proportion per body region```{r}#| cache: true # load dataord.dat <-read_csv("data/ordinal_data_for_model.csv")ord.dat$g <-1# add grouping variable # fit model time_ord <-system.time( m5 <-glmmTMB(prop_carotenoid ~ sex + (1|species) +propto(0+ species|g,phylo.mat),family =ordbeta(),data = ord.dat))# outputdata.frame(model ="Ordinal beta",convergence = m5$sdr$pdHess,runtime = time_ord[["elapsed"]],AIC =AIC(m5))```<!-- time_norm <- system.time( --><!-- m6 <- glmmTMB(prop_carotenoid ~ sex + (1|species) + propto(0 + species|g,phylo.mat), --><!-- family = gaussian(), --><!-- data = ord.dat) --><!-- ) --><!-- # output --><!-- data.frame( --><!-- model = c("Ordinal beta", "gaussian"), --><!-- convergence = c(m5$sdr$pdHess, m6$sdr$pdHess), --><!-- runtime = c(time_ord[["elapsed"]], time_norm[["elapsed"]]), --><!-- AIC = c(AIC(m5), AIC(m6)) --><!-- ) -->##### Model diagnosticsLook at residual diagnostic plots:```{r}#| cache: true res_ord <- DHARMa::simulateResiduals(fittedModel = m5)plot(res_ord)```<!-- res_norm <- DHARMa::simulateResiduals(fittedModel = m6) --><!-- plot(res_norm) -->##### Model estimatesPhylogenetic signal:```{r}#| output: true#| cache: true # get random effect component estimates on SD-scalem5_re <-as.data.frame(confint(m5, parm="theta_"))sigma2_s <- m5_re$Estimate[1]^2# non-phylogenetic variance est.sigma2_p <- m5_re$Estimate[2]^2# phylogenetic variance est.p.signal.m5 <- sigma2_p / (sigma2_p + sigma2_s)p.signal.m5```Now look at the difference in the proportion of carotenoid colour between females and males from the ordinal beta model (m5), and plot the model-based estimate with its 95% confidence interval on the response scale.```{r}emm_m5 <-emmeans(m5, ~ sex, type ="response")# pairwise contrast: Female vs Malem5_sex_OR <-contrast(emm_m5, method ="pairwise") |>summary(infer =TRUE, type ="response")# get summary with CIsex_OR <-as.data.frame(m5_sex_OR) |>rename(OR = odds.ratio, lower.CL = asymp.LCL, upper.CL = asymp.UCL)sex_OR```## Case study 2: Evolution of plant hydraulic traits We re-analysed the published study of Sanchez-Martinez et al. (2020) on the evolution of plant hydraulic traits using phylogenetic generalized linear mixed models (PGLMMs). The original study used a Bayesian MCMC approach to fit the models with the `MCMCglmm` package. Here we will use the `glmmTMB` package to fit the models with different sampling distributions and compare with `MCMCglmm`. Again all results are for illustrative purpose only and we do not intend to infer any biological or ecological conclusions from the following results.[credits: Chanya_B / stock.adobe.com]{style="color:grey; font-size:0.9em;"}### Data overview```{r}# Load plant hydraulic traits datasethydra.dat <-read_csv("data/HydraEvol2020.csv")# Have a look at distribution of traitshist(hydra.dat$Ks, breaks =50, main ="Distribution of Ks", xlab ="Ks")boxplot(hydra.dat$Ks ~ hydra.dat$group, main ="Boxplot of Ks by group", xlab ="Group", ylab ="Ks")hist(hydra.dat$P50, breaks =50, main ="Distribution of P50", xlab ="P50")boxplot(hydra.dat$P50 ~ hydra.dat$group, main ="Boxplot of P50 by group", xlab ="Group", ylab ="P50")```### Phylogenetic tree```{r}# Load tree plant.tree <- ape::read.tree("data/genus-level_phylogeny.tre")# plot treeplotTree(plant.tree, ftype="i", fsize=0.4, lwd=1, type="fan")dev.off()```### 1. Model for kSPlant species with high saturated hydraulic conductivity (kS) are able to transport water through their xylem more efficiently, and are therefore characterised as highly efficient in water transport.First let's set up the models for glmmTMB: - Log(response) Gaussian.- Gamma model.First, let's set up the phylogenetic correlation matrix for the `glmmTMB` model and the inverse matrix for `MCMCglmm`. We want to obtain a phylogenetic matrix that match all genera in the Ks dataset.m We will therefore remove all missing entries for the `Ks` trait and prune the tree and the dataset to have the same genera in both. ```{r}#| cache: TRUE# create dat for ks (remove all missing values)ks.dat <-subset(hydra.dat, !is.na(Ks))all(is.na(ks.dat$genus))# get common genus dataset and treecommon.genus <-intersect(plant.tree$tip.label, ks.dat$genus)# Prune treeplant.tree.pruned <- ape::drop.tip(plant.tree, setdiff(plant.tree$tip.label, common.genus))#length(plant.tree.pruned$tip.label)# Prune datasetks.dat <- ks.dat[ks.dat$genus %in% common.genus, ]#length(table(ks.dat$genus))# check whether names match in data and treecheck <- geiger::name.check(plant.tree.pruned, ks.dat$genus, sort(plant.tree.pruned$tip.label))# set up phylogenetic correlation matrixphylo.mat.p <-vcv(plant.tree.pruned, corr =TRUE) phylo.mat.p <- phylo.mat.p[sort(rownames(phylo.mat.p)), sort(rownames(phylo.mat.p))]# set up inverse phylogenetic correlation matrix for MCMCglmmphylo.inv <- MCMCglmm::inverseA(plant.tree.pruned, nodes ="TIPS")$Ainvphylo.inv <- phylo.inv[sort(rownames(phylo.inv)),]# checks # length(colnames(phylo.mat.p))==length(table(ks.dat$genus))# all(head(rownames(phylo.mat.p))==head(colnames(phylo.mat.p)))# length(rownames(phylo.inv))==length(table(ks.dat$genus))# head(table(ks.dat$genus))``````{r}#| cache: true # make sure to add grouping factor for glmmTMBks.dat$g <-1ks.dat$genus <-as.factor(ks.dat$genus)# Fit models --------------# normaltime_norm <-system.time( m6 <-glmmTMB(log(Ks) ~ group + (1|genus) +propto(0+ genus|g, phylo.mat.p),family =gaussian(),REML=TRUE,data = ks.dat) )# Gamma distributiontime_gamma <-system.time( m7 <-glmmTMB(Ks ~ group + (1|genus) +propto(0+ genus|g,phylo.mat.p),family =Gamma(link ="log"),data = ks.dat))```Now, we will fit the Gaussian Log(Ks) model with MCMCglmm:```{r}#| eval: FALSE#| echo: FALSE# set priorsprior.hydra <-list(R =list(V =1, nu =0.002), G =list(G1 =list(V =1, nu =0.002), G2 =list(V =1, nu =0.002))) # two random effects, phylogeny and genus# set up random effects for MCMCglmmks.dat$phylo <- ks.dat$genusall(rownames(phylo.inv) ==levels(ks.dat$phylo))# check if positive definite kappa(phylo.inv) # should be > 0.1try(chol(phylo.inv), silent =TRUE)# normaltime_mcmc.norm <-system.time( m6.mcmc <-MCMCglmm(log(Ks) ~ group, random =~genus + phylo,family ="gaussian",ginverse =list(phylo = phylo.inv),prior = prior.hydra,data = ks.dat,verbose=FALSE,nitt=13000,burnin=3000,thin=10))# set priors for two random effectsprior <-list(G=list(G1=list(V=1,nu=1,alpha.mu=0,alpha.V=1000), G2=list(V=1,nu=1,alpha.mu=0,alpha.V=1000)),R=list(V=1,nu=0.02))# fit MCMCglmm modeltime.mcmc <-system.time({ model_mcmc <-MCMCglmm(yi ~ x,random =~species + phylo,family ="gaussian",ginverse =list(phylo = phylo.prec.mat),prior = prior,data = dat,verbose =FALSE,nitt =303000, # increase default by x30burnin =3000, # defaultthin =10) # default})# check convergencem6.mcmc.conv <-heidel.diag(m6.mcmc$Sol[,1])$pvalue >0.05# check convergence of fixed effect``````{r}#| eval: FALSE#| echo: FALSE# Get model info ---------------------------# check whether model has postive definite hessianks_output <-data.frame(package =c("glmmTMB", "MCMCglmm", "glmmTMB"),model =c("Gaussian (log)", "Gaussian (log)", "Gamma"),convergence =c(m6$sdr$pdHess, m6.mcmc.conv , m7$sdr$pdHess),runtime =c(time_norm[["elapsed"]], time_mcmc.norm, time_gamma[["elapsed"]]))ks_output```#### Diagnostic plotsLet's check residual plots with the DHARMa packages. First, the log(Ks) assuming normal distribution residual checks:```{r}#| cache: true res_norm <- DHARMa::simulateResiduals(fittedModel = m6)plot(res_norm)```The model checks assuming Gamma distribution and log link function:```{r}#| cache: true res_gamma2 <- DHARMa::simulateResiduals(fittedModel = m7)plot(res_gamma2)```Residual diagnostics are relatively similar.#### ResultsGet phylogenetic signal from glmmTMB Gaussian model: ```{r}#| output: true#| cache: true # get random effect component estimates on SD-scale m6_re <-as.data.frame(confint(m6, parm="theta_"))sigma2_s <- m6_re$Estimate[1]^2# non-phylogenetic variancesigma2_p <- m6_re$Estimate[2]^2# phylogenetic variance estimatep.signal <- sigma2_p / (sigma2_p + sigma2_s)p.signal```We found that the phylogenetic species effect explained 83.9% of the total species level random effect variance, and hence the non-phylogenetic effect explained only 16.1% of the total species level random effect variance.Get model estimated marginals means on the response scale:```{r}# Estimated marginal means on the response scale (odds)emm_m6 <-emmeans(m6, ~ group, type ="response")emm_m6# Get the contrast between the two groupsgroup_diff <-contrast(emm_m6, method ="pairwise", reverse=TRUE) |>summary(infer =TRUE, type ="response")# Plotggplot(group_diff, aes(y = contrast, x = ratio)) +geom_point(size =3, colour ="#6BAF92") +geom_errorbar(aes(xmin = lower.CL, xmax = upper.CL), width =0.2, colour ="#6BAF92") +geom_vline(xintercept =1, linetype ="dashed", colour ="grey40") +labs(y =NULL, x ="",title ="kS contrast: Angiosperms vs Gymnosperms") +theme_bw() +theme(legend.position ="none")```### 2. Model for P50P50 represents the level of water stress at which a tree experiences a 50% loss of hydraulic conductivity. More negative P50 values indicate that trees can withstand greater hydraulic stress before suffering hydraulic damage, so communities with lower P50s are considered more resilient to increasing water stress (Trugman et al., 2020).First let's set up the models for glmmTMB:- Try gamma model - Log(response) gaussianFirst, le's a phylogenetic matrix that match all genera in the P50 dataset. We will therefore remove all missing entries for the `Ks` trait and prune the tree and the dataset to have the same genera in both. ```{r}#| cache: TRUE# create dat for ks (remove all missing values)p50.dat <-subset(hydra.dat, !is.na(P50))# create negative P50 variablep50.dat$negP50 <-- p50.dat$P50# get common genus across dataset and treecommon.genus <-intersect(plant.tree$tip.label, p50.dat$genus)# Prune datasetp50.dat <- p50.dat[p50.dat$genus %in% common.genus, ]# Prune treeplant.tree.pruned <- ape::drop.tip(plant.tree, setdiff(plant.tree$tip.label, common.genus))# check whether names match in data and treecheck <- geiger::name.check(plant.tree.pruned, p50.dat$genus, sort(plant.tree.pruned$tip.label))# set up phylogenetic correlation matrixphylo.mat.p <-vcv(plant.tree.pruned, corr =TRUE) phylo.mat.p <- phylo.mat.p[sort(rownames(phylo.mat.p)), sort(rownames(phylo.mat.p))]# set up inverse phylogenetic correlation matrix for MCMCglmmphylo.mat.p.inv <-solve(phylo.mat.p)# checks # length(colnames(phylo.mat.p))==length(table(p50.dat$genus))# all(head(rownames(phylo.mat.p))==head(colnames(phylo.mat.p)))# head(table(p50.dat$genus))``````{r}#| cache: true # make sure to add grouping factor for glmmTMBp50.dat$g <-1p50.dat$genus <-as.factor(p50.dat$genus)# Fit models --------------# normaltime_norm <-system.time( m8 <-glmmTMB(log(negP50) ~ group + (1|genus) +propto(0+ genus|g, phylo.mat.p),family =gaussian(),REML=TRUE,data = p50.dat) )# Gamma distributiontime_gamma <-system.time( m9 <-glmmTMB(negP50 ~ group + (1|genus) +propto(0+ genus|g,phylo.mat.p),family =Gamma(link ="log"),data = p50.dat))```Same models with MCMCglmm```{r}```#### Diagnostic plotsLet's check residual plots with the DHARMa packages. First, the log(Ks) assuming normal distribution residual checks:```{r}#| cache: true res_norm <- DHARMa::simulateResiduals(fittedModel = m8)plot(res_norm)```The model checks assuming Gamma distribution and log link function:```{r}#| cache: true res_gamma <- DHARMa::simulateResiduals(fittedModel = m9)plot(res_gamma)```Residual diagnostics are relatively similar, for both distributions so we keep the Gaussian log(response) model specification.#### ResultsGet phylogenetic signal from glmmTMB Gaussian model: ```{r}#| output: true#| cache: true # get random effect component estimates on SD-scale m9_re <-as.data.frame(confint(m9, parm="theta_"))sigma2_s <- m9_re$Estimate[1]^2# non-phylogenetic variancesigma2_p <- m9_re$Estimate[2]^2# phylogenetic variance estimatep.signal <- sigma2_p / (sigma2_p + sigma2_s)p.signal```We found that the phylogenetic species effect explained 83.9% of the total species level random effect variance, and hence the non-phylogenetic effect explained only 16.1% of the total species level random effect variance.Get model estimated marginals means on the response scale:```{r}# Estimated marginal means on the response scale (odds)emm_m9 <-emmeans(m9, ~ group, type ="response")emm_m9# Get the contrast between the two groupsgroup_diff <-contrast(emm_m9, method ="pairwise", reverse=TRUE) |>summary(infer =TRUE, type ="response")# Plotggplot(group_diff, aes(y = contrast, x = ratio)) +geom_point(size =3, colour ="#6BAF92") +geom_errorbar(aes(xmin = asymp.LCL, xmax = asymp.UCL), width =0.2, colour ="#6BAF92") +geom_vline(xintercept =1, linetype ="dashed", colour ="grey40") +labs(y =NULL, x ="",title ="kS contrast: Angiosperms vs Gymnosperms") +theme_bw() +theme(legend.position ="none")```# ReferencesBrooks, M. E., Kristensen, K., van Benthem, K. J., Magnusson, A., Berg, C. W., Nielsen, A., Skaug, H. J.,Machler, M., & Bolker, B. M. (2017). glmmTMB balances speed and flexibility among packages for zero-inflated generalized linear mixed modeling. R Journal, 9 (2), 378–400. https://doi.org/10.32614/RJ-2017-066Dunn, P. O., Armenta, J. K., & Whittingham, L. A. (2015). Natural and sexual selection act on different axes of variation in avian plumage color. Science Advances, 1(2), e1400155. https://doi.org/10.1126/sciadv.1400155Hadfield, J. D. (2024, May). MCMCglmm: MCMC Generalised Linear Mixed Models. Retrieved October 7, 2024, from https://cran.r-project.org/web/packages/MCMCglmm/index.htmlHill, G. E., & McGraw, K. J. (2006). Bird Coloration, Volume 2: Function and Evolution. Harvard University Press. https://doi.org/10.2307/j.ctv22jnr8kKristensen, K., & McGillycuddy, M. (2025). Covariance structures with glmmTMB. https://cran.r-project.org/web/packages/glmmTMB/vignettes/covstruct.htmlMcGillycuddy, M. (2023). Model-Based Assessment of Treatment Effect in Multivariate Abundance Data[Doctoral dissertation, The University of New South Wales].Sanchez-Martinez, P., Martinez-Vilalta, J., Dexter, K. G., Segovia, R. A., & Mencuccini, M. (2020). Adaptation and coordinated evolution of plant hydraulic traits. Ecology Letters, 23 (11), 1599–1610. https://doi.org/10.1111/ele.13584Trugman, A. T., Anderegg, L. D. L., Shaw, J. D., & Anderegg, W. R. L. (2020). Trait velocities reveal that mortality has driven widespread coordinated shifts in forest hydraulic trait composition. Proceedings of the National Academy of Sciences, 117(15), 8532–8538. https://doi.org/10.1073/pnas.1917521117# Session information```{r}#| label: Reproducibility-SessionInfo-R-environment#| fig-align: "center"#| out-width: '100%'#| results: asis#| message: false#| warnings: falselibrary(sessioninfo)library(details)si <-session_info()si$packages <- si$packages # |> filter(package %in% c("metafor", "ape", "clubSandwich", "Matrix", "corpcor", "dplyr", "kableExtra", "xtable", "rotl", "Hmisc", "lattice"))details(si, summary ='Current session info', open =FALSE)```> **Note**> This site may still be updated, and could be small mistakes.<a href="https://www.flaticon.com/free-icons/bird" title="bird icons">Bird icon created by Smashicons - Flaticon </a>